10 Brilliant or Atrocious Neo4j Cypher Hacks
These Cypher programming hacks in Neo4j could be seen by some like brilliant problem solving solutions, or like atrocious design flaws by others. They are hacks, because they do things in unusual ways. No matter what you think, you need to know about them.
Table of Contents
1. Oddities of the MATCH Traversal
Assume you want all actors who played with Keanu Reeves:
MATCH (keanu:Person {name: 'Keanu Reeves'})
MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)
WHERE EXISTS( (m)<-[:ACTED_IN]-(keanu) )
RETURN actor
The previous query will not do it, as it also returns Keanu Reeves himself. But we can do it this way:
MATCH (keanu:Person {name: 'Keanu Reeves'})
MATCH (actor:Person)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(keanu)
RETURN actor
This will miraculously exclude Keanu Reeves, but why? It is because of the main “by design” oddity of a Cypher MATCH traversal: if you use a pattern with at least two relationships of the same type, between the same kinds of nodes, Cypher will not allow rows with relationships already traversed in the results (it’s called relationship isomorphism).
This happened for the Keanu Reeves node in the second MATCH pattern, when first and last Person nodes are the same.
Remark that’s not about using multiple MATCH clauses, or splitting up with commas a pattern, or using just one single continuous pattern… As soon as you use a pattern or more interconnected patterns with two similar relationships, you risk to have rows unexpectedly excluded from the result.
Advanced Neo4j Cypher Puzzles with Answers
is a new Kindle e-book I recently published, with all these Cypher hacks and much more!
The book contains three big puzzles, with 10 questions each. Each question is with either a single-choice or multiple-selections.
Each choice is a Cypher query you must select as either a good or bad answer.
2. MERGE, The Big Duplicator
If you had the feeling you can use MERGE instead of CREATE to avoid duplicates, you’re in for a big surprise. As MERGE will in fact create more duplicates than you can imagine, if you don’t know how it actually works.
Assume Keanu Reeves (already in our database), writes a Note. How can we add the new entry?
MERGE (:Person {name: 'Keanu Reeves'})-[:WRITES]->(:Note {text: 'hello'})
If you believed this query will create only new and missing nodes or relationships in the database, you’re wrong. By design, MERGE is an all-or-nothing clause: if at least one element is missing, it will create ALL of them, as new. So yes, our query will create indeed a new Note, and a new :WRITE relationship, but it will link it to a new Keanu node as well.
To avoid this oddity, separate Keanu in a MATCH clause, and pass it to MERGE as a bound variable. MERGE will not create anything for elements passed through bound variables! But is this new query good?
MATCH (keanu:Person {name: 'Keanu Reeves'})
MATCH (note:Note {text: 'hello'})
MERGE (keanu)-[:WRITES]->(note)
Nope. Because Note is here a dependent node, it must always exist with the person who wrote it. The right query is:
MATCH (keanu:Person {name: 'Keanu Reeves'})
MERGE (keanu)-[:WRITES]->(:Note {text: 'hello'})
3. The Atrocious [*0] Var-Length Pattern
This could be the most confusing oddity in Cypher. And I’m not talking about the typical [*1..n] relationship pattern, that is great! But a minimum 0 relationship depth doesn’t make any sense, the way it works. You might expect the reflexive relationships returned, but it’s not the case.
Assume you want to return The Matrix, and the actors who played in that movie:
One possible intuitive query would separately aggregate a Movie node with Person nodes, through :ACTED_IN relationships and a UNION clause. But following Cypher query unexpectedly does the same:
MATCH (actor)-[:ACTED_IN*0..]->(:Movie {title: 'The Matrix'})
RETURN actor
Without the *0 Cypher hack, the query will simply skip the Movie node, as expected. But *0 will also add it to the results. A mixed result.
4. Conditional Execution with …FOREACH
Assume you want to set some node property, but only if you find Keanu is born in 1964.
Hmm, we don’t have an IF-THEN-ELSE statement in Cypher, and CASE-WHEN can only be used to return expressions. There is however this atrocious but brilliant FOREACH Cypher hack you can use instead:
MATCH (keanu:Person {name: 'Keanu Reeves'})
FOREACH (_ IN
CASE WHEN keanu.born = 1964 THEN [1] ELSE [] END
| SET keanu.gotcha = true)
The property will be SET only if Keanu is born in 1964, otherwise whatever actions you may pass in FOREACH will not happen. The Neo4j Cypher hack creates either a list with one single element (usually a simple literal, like 1 or true), or an empty list. And then the iteration will call SET either once, or not at all. Remark also we can use _ as a dummy variable.
5. Other Oddities with FOREACH
The following query should normally create a linked list. First three lines create five Node nodes, and return a collection with them:
UNWIND range(1, 5) AS i
CREATE (n:Node {id: i})
WITH collect(n) AS nodes
FOREACH (i IN range(0, size(nodes)-2)
| CREATE (nodes[i])-[:LINK]->(nodes[i+1]))
Last two lines are supposed to connect every two adjacent nodes. But the query fails because we cannot use directly indexed list items in a pattern. We need a rather elaborated Neo4j Cypher hack, which will provide additional lists with one element each:
UNWIND range(1, 5) AS i
CREATE (n:Node {id: i})
WITH collect(n) AS nodes
FOREACH (i IN range(0, size(nodes)-2)
| FOREACH (n1 IN [nodes[i]]
| FOREACH (n2 IN [nodes[i+1]]
| CREATE (n1)-[:LINK]->(n2))))
And we’re not out of the woods with FOREACH. Assume you want to update the nodes you just added:
MATCH (n:Node)
WITH collect(n) AS nodes
FOREACH (i IN range(0, size(nodes)-1)
| SET nodes[i].visited = true)
This will fail, with a syntax error. But just surrounding nodes[i] in parentheses seems to fix it:
MATCH (n:Node)
WITH collect(n) AS nodes
FOREACH (i IN range(0, size(nodes)-1)
| SET (nodes[i]).visited = true)
6. FOREACH with WHERE Filter
List comprehensions allow a WHERE clause to filter data, FOREACH doesn’t. But that’s a nice hack to provide one, with an inline list comprehension perfectly valid:
MATCH (actor:Person)-[:ACTED_IN]->(:Movie {title: 'The Matrix'})
WITH collect(actor) as actors
FOREACH (actor IN
[a IN actors WHERE a.name <> 'Keanu Reeves' | a]
| SET actor.visited = true)
List/pattern comprehensions, together with FOREACH, may look intimidating the first time, if you’re not already familiar with Python, Haskell or some functional programming. But they are actually pipeline shortcuts for loops. List/pattern comprehensions are used to filter and transform collection elements, while FOREACH loops execute actions for every element (usually data change actions), returning nothing.
I’ve also seen people trying hard to return values from FOREACH. But ask yourself if you shouldn’t use instead a list comprehension. If you don’t need updates, list comprehensions are even better, with more functionality.
7. WITH as a Firewall
Use WITH clauses to separate different execution stages for your query. Beside switching the cardinality and executing GROUP BY operations with aggregates.
A first odd effect is that whatever bound variable you do not propagate further, can be reused. And people can produce atrocities like this one, an otherwise correct functional query, but in which both actor and actors identifiers have been heavily reused:
WITH ['Keanu Reeves', 'Hugo Weaving'] as actors
MATCH (actor:Person)
WHERE actor.name IN actors
WITH collect(actor) AS actors
MATCH (movie:Movie)
WHERE ALL(actor IN actors WHERE (actor)-[:ACTED_IN]->(movie))
MATCH (actor:Person)-[:ACTED_IN]->(movie)
WHERE NOT actor IN actors
RETURN movie.title, collect(actor.name) AS actors
WITH can also be used as a simple separator between subqueries within the same query (to avoid ;). But you would need the Cypher hack below to reset the cardinality to 1:
MATCH (actor:Person)-[:ACTED_IN]->(:Movie {title: 'The Matrix'})
SET actor.visited = true
WITH DISTINCT 1 AS _
CREATE (:Report {id: 1})
If you want to create one single Report, WITH DISTINCT is required (or you can call instead count(*), or another aggregate). This is because WITH will always return one single row, regardless of what’s around, and further continue with CREATE called always once. Without DISTINCT or a similar WITH hack, CREATE would be called for every previous SET, which is for every actor from The Matrix.
8. Some Oddities with NOT
There is no NOT IN operator in Cypher (as in SQL). But there is an IS NOT NULL (for IS NULL).
Of course, you can use NOT x IN list, but it is a bit awkward. The same way you would use NOT x IS NULL.
And this is why I’ve seen many queries using NOT(x IN list), which triggers another possible misunderstanding: you may be led to believe NOT is a function! The same way you have NONE, ALL, ANY or EXISTS… But parentheses are use there just for more clarity, that’s all.
9. Missing Errors
Designers of Cypher took in many regards an optimistic approach. And that’s not always good, because you end up often with a result which is not correct. When you get nothing but you expect something, or when you end up with way too many rows, something else may be wrong.
CREATE (:Milestone {year: 1995}),
(:Milestone {year: '1997'}),
(:Milestone {year: 1998})
After creating some nodes, the following query returns just two of them (when it should find them all):
MATCH (m:Milestone)
WHERE m.year < 2000
RETURN m
The devil is in the details, and you can see the ‘1997’ value was stored like a string. When compared to an integer, it returns null, and the node is not selected in the result. This could be eventually avoided here by using defensive programming and toInteger(m.year).
Remember your query is as good as your data. And Cypher doesn’t enforce a property data type. And does not trigger an error in many cases when it should.
10. From List to Rows, to List
Like any query language, Cypher returns rows. But the addition of lists made it unusually powerful. It is essential to know, at every stage of your query, if you work on rows or on a list. The statements, functions and operations available for each are different.
I’ve seen people trying to UNWIND rows, or collect again the elements of a list. Or switching way too many unnecessary times between rows and lists. However, do not hesitate to switch back and forth between lists and rows, especially when you can solve some problems with one of these types, and not the other.
The following query switches back and forth from a list to rows, and back to a list, to remove duplicates with DISTINCT (a row-based operation):
WITH [1, 1, 3, 2, 4, 2] AS list
UNWIND list AS rows
WITH DISTINCT rows
RETURN collect(rows) AS list
However, never do this:
WITH [1, 1, 3, 2, 4, 2] AS list
UNWIND list AS rows
WITH rows SKIP 2 LIMIT 3
RETURN collect(rows) AS list
The Neo4j support for lists in Cypher is pretty powerful. And the list-based alternative to SKIP and LIMIT is by simply using here a list[2..5] call.
Neo4j Certified Professional: Exam Practice Tests
is a new Kindle e-book I recently published, to help you pass the Neo4j Certified Professional exam!
All questions are closely emulated from those currently found in the actual exam. You’ll know right away what questions you missed, and what the correct answers are.
Detailed explanations with external references for any possible choice, in each question.
BONUS: Internal Count Store
I bet you didn’t know about this! And if you have a huge database, it can matter if you used a cached data value for your count(*) or not.
The following query will return fast, as it hits the internal transactional count store, and if you PROFILE it, you’ll see RelationshipFromCountStore as first stage:
MATCH ()-[:ACTED_IN]->(:Movie)
RETURN count(*) as count
But this query, in which you tried to be even more specific, may take much longer. Relationships with specific node labels don’t hit the count store. If you use just one label, or no label at all, is fine.
MATCH (:Person)-[:ACTED_IN]->(:Movie)
RETURN count(*) as count