Neo4j Graph Algorithms: (1) Path Finding Algorithms
Path finding algorithms find the shortest path between two or more nodes or evaluate the availability and quality of paths. This visual presentation of the Neo4j graph algorithms is focused on quick understanding and less implementation details. With small reusable samples, for less time-consuming labs.
Table of Contents
Introduction
This article presents quickly – in a graphical and descriptive manner, skipping many implementation details – most of the Path Finding algorithms implemented by Neo4j in their Graph Data Science (GDS) library. We focus on one or two small samples reused by most of the algorithms, to keep it simple and allow for less time-consuming labs.
Run separately the following two queries in a blank document. All these Cypher queries can be run quickly in a new Blank Project on the free online Neo4j Sandbox. Or, if you have your own environment, on your Neo4j Desktop.
CREATE (a:Node {name: 'A'}),
(b:Node {name: 'B'}),
(c:Node {name: 'C'}),
(d:Node {name: 'D'}),
(e:Node {name: 'E'}),
(f:Node {name: 'F'}),
(a)-[:REL {cost: 50}]->(b),
(a)-[:REL {cost: 50}]->(c),
(a)-[:REL {cost: 100}]->(d),
(b)-[:REL {cost: 40}]->(d),
(c)-[:REL {cost: 40}]->(d),
(c)-[:REL {cost: 80}]->(e),
(d)-[:REL {cost: 30}]->(e),
(d)-[:REL {cost: 80}]->(f),
(e)-[:REL {cost: 40}]->(f);
CALL gds.graph.create('myGraph', 'Node', 'REL', {
relationshipProperties: 'cost' });
The CREATE statement will add some Node-labeled nodes with weighted REL relationships. This is our main graph, with connected nodes. The “cost” property is used for a weighted graph when looking for the shortest path. Or it can be ignored and the shortest path will be calculated in term of hops, of number of nodes traversed.
The second query creates one myGraph native projection in memory for our graph.
Call anytime MATCH (n) RETURN n; to get a visual representation of the whole graph:
1. Breadth-First Search (BFS) Algorithm
The BFS algorithm is a graph traversal algorithm, used in shortest path and other more advanced algorithms. The query below returns the order of traversal for our graph, as something like [A, C, D, B, E, F]. It starts from the node mentioned as start node (A), then it traverses all its immediate children (C, D, and B – in whatever order), then on the next level (E and F), and so on:
MATCH (a:Node {name:'A'})
WITH id(a) AS start
CALL gds.alpha.bfs.stream('myGraph', {startNode: start})
YIELD path
RETURN [ n in nodes(path) | n.name ] AS path;
2. Depth-First Search (DFS) Algorithm
The DFS algorithm is another graph traversal algorithm, used in other more advanced algorithms. The query below returns another order of traversal for our graph, something like [A, C, E, F, D, B]. It also starts from the node mentioned as start node (A), but then it goes deep to any child of a child: after C, it goes to the first child of C (E), then to the first child of E (F) and so on, before moving to other children of A (D and B).
MATCH (a:Node {name:'A'})
WITH id(a) AS start
CALL gds.alpha.dfs.stream('myGraph', {startNode: start})
YIELD path
RETURN [ n in nodes(path) | n.name ] AS path;
3. Dijkstra Source-Target Algorithm
GDS splits up shortest path algorithms between shortestPath (for a single source-target pair of nodes) and allShortestPaths (for multiple paths from the same source node). In this context, Dijkstra Source-Target will find and show here below the shortest weighted or unweighted path between nodes A and F, based on the cost relationship properties, or the number of hops (i.e. traversed nodes).
3.1. Unweighted Graph
Here is the version for an unweighted graph. The algorithm works on the myGraph native projection we already prepared in memory:
MATCH (source:Node {name: 'A'}), (target:Node {name: 'F'})
CALL gds.beta.shortestPath.dijkstra.stream('myGraph', {
sourceNode: id(source),
targetNode: id(target)
})
YIELD nodeIds, costs
RETURN [id IN nodeIds | gds.util.asNode(id).name] AS path, costs;
This will return [A, D, F] as shortest path traversal from A to F, with 2 as total cost, because we traverse a total of two other nodes.
3.2. Weighted Graph
Now for a weighted graph, based on the cost relationship property:
MATCH (source:Node {name: 'A'}), (target:Node {name: 'F'})
CALL gds.beta.shortestPath.dijkstra.stream('myGraph', {
sourceNode: id(source),
targetNode: id(target),
relationshipWeightProperty: 'cost'
})
YIELD nodeIds, costs
RETURN [id IN nodeIds | gds.util.asNode(id).name] AS path, costs;
This will return the [A, B, D, E, F] path, with the cumulated cost [0, 50, 90, 120, 160] (we’ll skip quotes and trailing zeros for more readability). The total cost between A and F is the last value, 160. Shortest path means in this context the smallest total cost value, not the largest.
Remark we took the D-E-F route, instead of jumping directly from D to F, because 30+40=70 is less than the direct 80 value.
4. Yen’s k-Shortest Path Algorithm
Yen’s is a variation of Dijkstra’s Source-Target algorithm, that explores up to k different paths between a pair of nodes. It starts with the shortest path, then lists other maximum k alternatives. Here is Yen’s called for the same A to F pair, with k = 3:
MATCH (source:Node {name: 'A'}), (target:Node {name: 'F'})
CALL gds.beta.shortestPath.yens.stream('myGraph', {
sourceNode: id(source),
targetNode: id(target),
relationshipWeightProperty: 'cost',
k: 3
})
YIELD nodeIds, costs
RETURN [id IN nodeIds | gds.util.asNode(id).name] AS path, costs;
Remark how the second path is also a shortest path (same 160 total cost, going through A-C-D instead of A-B-D), and the third is another path discussed before:
- [A, B, D, E, F] –> [0, 50, 90, 120, 160]
- [A, C, D, E, F] –> [0, 50, 90, 120, 160]
- [A, B, D, F] –> [0, 50, 90, 170]
5. Dijkstra Single-Source Algorithm
Dijkstra Single-Source is basically the same popular Dijkstra algorithm, but used for the shortest path from one node (A) to any other node:
MATCH (source:Node {name: 'A'})
CALL gds.beta.allShortestPaths.dijkstra.stream('myGraph', {
sourceNode: id(source),
relationshipWeightProperty: 'cost'
})
YIELD nodeIds, costs
RETURN [id IN nodeIds | gds.util.asNode(id).name] AS path, costs;
The query will return pairs of path and costs arrays. Going from A to A (to the same node) always has a cost 0. Beside the path from A to F calculated before, we’ll also get the shortest path from A to any other node:
- [A, A] –> [0]
- [A, B] –> [0, 50]
- [A, C] –> [0, 50]
- [A, B, D] –> [0, 50, 90]
- [A, B, D, E] –> [0, 50, 90, 120]
- [A, B, D, E, F] –> [0, 50, 90, 120, 160]
6. Single Source Shortest Path (SSSP) Algorithm
Similar to Dijkstra Single-Source, the SSSP algorithm calculates the total shortest weighted path from one node (A here) to all other nodes in the graph.
MATCH (source:Node {name: 'A'})
CALL gds.alpha.shortestPath.deltaStepping.stream({
nodeProjection: 'Node',
relationshipProjection: { REL: { type:'REL', properties:'cost' }},
startNode: source,
relationshipWeightProperty: 'cost',
delta: 3.0
})
YIELD nodeId, distance
RETURN gds.util.asNode(nodeId).name AS target, distance AS cost
ORDER BY target;
The result is similar to the one for Dijkstra Single-Source algorithm, except we show only the target node and the total cost:
- A –> 0
- B –> 50
- C –> 50
- D –> 90
- E –> 120
- F –> 160
7. All Pairs Shortest Path (APSP) Algorithm
APSP is similar to SSSP, except it has optimizations. The following query returns the exact same result as before:
MATCH (source:Node {name: 'A'})
CALL gds.alpha.allShortestPaths.stream({
nodeProjection: 'Node',
relationshipProjection: { REL: { type:'REL', properties:'cost' }},
relationshipWeightProperty: 'cost'
})
YIELD sourceNodeId, targetNodeId, distance
WHERE sourceNodeId = id(source)
MATCH (target:Node) WHERE id(target) = targetNodeId
RETURN target.name AS target, distance AS cost
ORDER BY target;
8. Random Walk Algorithm
Unlike the shortest path algorithms presented before, Random Walk generates each time another random paths in the graph.
Following query will generate a random 3-steps path (through 4 total nodes) starting from A. This can be [A, D, E, F], [A, B, D, E], or any other similar valid path:
MATCH (source:Node {name: 'A'})
CALL gds.alpha.randomWalk.stream({
nodeProjection: 'Node',
relationshipProjection: { REL: { type: 'REL' } },
start: id(source),
steps: 3,
walks: 1
})
YIELD nodeIds
RETURN [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS path;
9. Minimum Spanning Tree (MST) Algorithm
In a weighted graph, MST starts from a node, and finds all its reachable nodes and the set of relationships that connect the nodes together with the minimum possible weight.
MATCH (source:Node {name: 'A'})
CALL gds.alpha.spanningTree.minimum.write({
nodeProjection: 'Node',
relationshipProjection: { REL: { type:'REL', properties:'cost' }},
startNodeId: id(source),
relationshipWeightProperty: 'cost',
writeProperty: 'MINST',
weightWriteProperty: 'writeCost'
})
YIELD createMillis, computeMillis, writeMillis, effectiveNodeCount
RETURN createMillis, computeMillis, writeMillis, effectiveNodeCount;
Previous MST query added new MINST relationships between nodes, with a writeCost property displayed in clear here:
To find all pairs of nodes included in our MST:
MATCH path = (source:Node {name: 'A'})-[:MINST*]-()
WITH relationships(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel AS rel
RETURN startNode(rel).name AS from,
endNode(rel).name AS to,
rel.writeCost AS cost;
The result is the cost for each pair of x–>y nodes:
- A –> C = 50
- B –> B = 50
- B –> D = 40
- D –> E = 30
- E –> F = 40
10. A* Shortest Path Algorithm
A* computes the shortest path between two nodes. It is an informed search algorithm as it uses a heuristic function to guide the graph traversal. The algorithm supports weighted graphs with positive relationship weights.
The typical example requires latitude and longitude node properties, and will not be presented here for now.
Conclusions
Read all articles from the same Neo4j Graph Algorithms series:
- Path Finding Algorithms
- Centrality Algorithms
- Similarity Algorithms
- Community Detection Algorithms
- Link Prediction Algorithms