Neo4j Graph Algorithms: (5) Link Prediction Algorithms

Published by Cristian Scutaru on April 5, 2021April 5, 2021

Link Prediction algorithms or rather functions help determine the closeness of a pair of nodes. The computed scores can then be used to predict new relationships between them. This visual presentation of the Neo4j graph algorithms is focused on quick understanding and less implementation details. With small reusable samples, for less time-consuming labs.

Table of Contents

Introduction

This article presents quickly – in a graphical and descriptive manner, skipping many implementation details – the Link Prediction algorithms implemented by Neo4j in their Graph Data Science (GDS) library. We focus on just one small sample data used by all algorithms, to keep it simple and allow for less time-consuming labs.

Run the following CREATE query in a new project. All the following Cypher queries can be run quickly in a new Blank Project on the free online Neo4j Sandbox. Or, if you have your own environment, on your Neo4j Desktop.

CREATE (a:Node {name: 'A', community: 1}),
   (b:Node {name: 'B', community: 1}),
   (c:Node {name: 'C', community: 1}),
   (d:Node {name: 'D', community: 1}),
   (e:Node {name: 'E', community: 5}),
   (f:Node {name: 'F', community: 5}),
   (a)-[:REL]->(b),
   (a)-[:REL]->(c),
   (a)-[:REL]->(d),
   (b)-[:REL]->(d),
   (c)-[:REL]->(d),
   (c)-[:REL]->(e),
   (d)-[:REL]->(e),
   (d)-[:REL]->(f),
   (e)-[:REL]->(f);

This creates a graph of Node nodes, connected by some REL directed and unweighted relationships. Nodes are also grouped into two different communities, by a community node property value that will be consider later on, by the Same Community function:

All Link Prediction algorithms are rather exposed as simple functions, that take two nodes as arguments, and return a numeric score, frequently between 0 and 1. There is no need to create a graph projection, to estimate the memory used etc.

All Cypher queries used as examples here are very similar, and measure the closeness of the A node to any other node in the graph. We could have had just one big query with each of these functions as a different returned column:

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.adamicAdar(p1, p2) AS aA,
   gds.alpha.linkprediction.commonNeighbors(p1, p2) AS cN,
   gds.alpha.linkprediction.preferentialAttachment(p1, p2) AS pA,
   gds.alpha.linkprediction.resourceAllocation(p1, p2) AS rA,
   gds.alpha.linkprediction.sameCommunity(p1, p2) AS sC,
   gds.alpha.linkprediction.totalNeighbors(p1, p2) AS tN;

1. Adamic Adar Function

Adamic Adar is a measure used to compute the closeness of nodes based on their shared neighbors. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.adamicAdar(p1, p2) AS score;

The previous Cypher query calculates the closeness of A to any other node. A is closest to D (2.35 score here), because D basically connects all other nodes in the graph.

A –> D = 2.35
A –> E = 1.53
A –> F = 0.62
A –> B = 0.62
A –> C = 0.62

2. Common Neighbors Function

Common neighbors captures the idea that two strangers who have a friend in common are more likely to be introduced than those who don’t have any friends in common. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.commonNeighbors(p1, p2) AS score;

Just like for Adamic Adar, the previous very similar Cypher query calculates how many neighbors A has in common with any other node:

A and D connect directly, and have B and C as common neighbors (this is what matters).
A and E do not connect directly, but have also two nodes in common (D and C).
A and F have only D in common.
A and B connect directly, and have only D in common.
A and C connect directly, and also have only D in common.

3. Preferential Attachment Function

Preferential Attachment is a measure used to compute the closeness of nodes, based on their shared neighbors. A value of 0 indicates that two nodes are not close, while higher values indicate that nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.preferentialAttachment(p1, p2) AS score;

Once again, A is closest to D:

A –> D = 15
A –> E = 9
A –> F = 6
A –> B = 6
A –> C = 9

4. Resource Allocation Function

Resource Allocation is a measure used to compute the closeness of nodes based on their shared neighbors. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.resourceAllocation(p1, p2) AS score;

A –> D = 0.83
A –> E = 0.53
A –> F = 0.2
A –> B = 0.2
A –> C = 0.2

5. Same Community Function

Same Community is a way of determining whether two nodes belong to the same community. These communities could be computed by using one of the Community detection algorithms. In our sample, there is already a community node property referenced internally by this function.

A value of 0 indicates that two nodes are not in the same community. A value of 1 indicates that two nodes are in the same community.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.sameCommunity(p1, p2) AS score;

Unlike any other function here, remember this one uses internally the community node property values. Nodes B, C and D are found in the same community with A (score 1, for community 1). But not E and F (score 0, for community 5):

6. Total Neighbors Functions

Total Neighbors computes the closeness of nodes, based on the number of unique neighbors that they have. It is based on the idea that the more connected a node is, the more likely it is to receive new links. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.totalNeighbors(p1, p2) AS score;

A and B = 4
A and C = 5
A and D = 6
A and E = 4
A and F = 4

Conclusions

Read all articles from the same Neo4j Graph Algorithms series:

Path Finding Algorithms
Centrality Algorithms
Similarity Algorithms
Community Detection Algorithms
Link Prediction Algorithms

Need a NoSQL Expert?

Certified Solutions Architect in Azure and AWS

Certified Professional in Cassandra, Couchbase, Redis, Neo4j

Experienced in DynamoDB, Cosmos DB, MongoDB

Check all our specialized NoSQL services

Cart

Neo4j Graph Algorithms: (5) Link Prediction Algorithms

Introduction

1. Adamic Adar Function

2. Common Neighbors Function

3. Preferential Attachment Function

4. Resource Allocation Function

5. Same Community Function

6. Total Neighbors Functions

Conclusions

Cristian Scutaru

Graphs

Neo4j Graph Algorithms: (4) Community Detection Algorithms

Graphs

Neo4j Graph Algorithms: (3) Similarity Algorithms

Graphs

Neo4j Graph Algorithms: (2) Centrality Algorithms

Neo4j Graph Algorithms: (5) Link Prediction Algorithms

Introduction

1. Adamic Adar Function

2. Common Neighbors Function

3. Preferential Attachment Function

4. Resource Allocation Function

5. Same Community Function

6. Total Neighbors Functions

Conclusions

Cristian Scutaru

Related Posts

Graphs

Neo4j Graph Algorithms: (4) Community Detection Algorithms

Graphs

Neo4j Graph Algorithms: (3) Similarity Algorithms

Graphs

Neo4j Graph Algorithms: (2) Centrality Algorithms