Neo4j Graph Algorithms: (5) Link Prediction Algorithms

Published by Cristian Scutaru on

Link Prediction algorithms or rather functions help determine the closeness of a pair of nodes. The computed scores can then be used to predict new relationships between them. This visual presentation of the Neo4j graph algorithms is focused on quick understanding and less implementation details. With small reusable samples, for less time-consuming labs.

Table of Contents

Introduction

This article presents quickly – in a graphical and descriptive manner, skipping many implementation details – the Link Prediction algorithms implemented by Neo4j in their Graph Data Science (GDS) library. We focus on just one small sample data used by all algorithms, to keep it simple and allow for less time-consuming labs.

Run the following CREATE query in a new project. All the following Cypher queries can be run quickly in a new Blank Project on the free online Neo4j Sandbox. Or, if you have your own environment, on your Neo4j Desktop.

CREATE (a:Node {name: 'A', community: 1}),
   (b:Node {name: 'B', community: 1}),
   (c:Node {name: 'C', community: 1}),
   (d:Node {name: 'D', community: 1}),
   (e:Node {name: 'E', community: 5}),
   (f:Node {name: 'F', community: 5}),
   (a)-[:REL]->(b),
   (a)-[:REL]->(c),
   (a)-[:REL]->(d),
   (b)-[:REL]->(d),
   (c)-[:REL]->(d),
   (c)-[:REL]->(e),
   (d)-[:REL]->(e),
   (d)-[:REL]->(f),
   (e)-[:REL]->(f);

This creates a graph of Node nodes, connected by some REL directed and unweighted relationships. Nodes are also grouped into two different communities, by a community node property value that will be consider later on, by the Same Community function:

All Link Prediction algorithms are rather exposed as simple functions, that take two nodes as arguments, and return a numeric score, frequently between 0 and 1. There is no need to create a graph projection, to estimate the memory used etc.

All Cypher queries used as examples here are very similar, and measure the closeness of the A node to any other node in the graph. We could have had just one big query with each of these functions as a different returned column:

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.adamicAdar(p1, p2) AS aA,
   gds.alpha.linkprediction.commonNeighbors(p1, p2) AS cN,
   gds.alpha.linkprediction.preferentialAttachment(p1, p2) AS pA,
   gds.alpha.linkprediction.resourceAllocation(p1, p2) AS rA,
   gds.alpha.linkprediction.sameCommunity(p1, p2) AS sC,
   gds.alpha.linkprediction.totalNeighbors(p1, p2) AS tN;

1. Adamic Adar Function

Adamic Adar is a measure used to compute the closeness of nodes based on their shared neighbors. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.adamicAdar(p1, p2) AS score;

The previous Cypher query calculates the closeness of A to any other node. A is closest to D (2.35 score here), because D basically connects all other nodes in the graph.

  • A –> D = 2.35
  • A –> E = 1.53
  • A –> F = 0.62
  • A –> B = 0.62
  • A –> C = 0.62

2. Common Neighbors Function

Common neighbors captures the idea that two strangers who have a friend in common are more likely to be introduced than those who don’t have any friends in common. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.commonNeighbors(p1, p2) AS score;

Just like for Adamic Adar, the previous very similar Cypher query calculates how many neighbors A has in common with any other node:

  • A and D connect directly, and have B and C as common neighbors (this is what matters).
  • A and E do not connect directly, but have also two nodes in common (D and C).
  • A and F have only D in common.
  • A and B connect directly, and have only D in common.
  • A and C connect directly, and also have only D in common.

3. Preferential Attachment Function

Preferential Attachment is a measure used to compute the closeness of nodes, based on their shared neighbors. A value of 0 indicates that two nodes are not close, while higher values indicate that nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.preferentialAttachment(p1, p2) AS score;

Once again, A is closest to D:

  • A –> D = 15
  • A –> E = 9
  • A –> F = 6
  • A –> B = 6
  • A –> C = 9

4. Resource Allocation Function

Resource Allocation is a measure used to compute the closeness of nodes based on their shared neighbors. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.resourceAllocation(p1, p2) AS score;
  • A –> D = 0.83
  • A –> E = 0.53
  • A –> F = 0.2
  • A –> B = 0.2
  • A –> C = 0.2

5. Same Community Function

Same Community is a way of determining whether two nodes belong to the same community. These communities could be computed by using one of the Community detection algorithms. In our sample, there is already a community node property referenced internally by this function.

A value of 0 indicates that two nodes are not in the same community. A value of 1 indicates that two nodes are in the same community.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.sameCommunity(p1, p2) AS score;

Unlike any other function here, remember this one uses internally the community node property values. Nodes B, C and D are found in the same community with A (score 1, for community 1). But not E and F (score 0, for community 5):

6. Total Neighbors Functions

Total Neighbors computes the closeness of nodes, based on the number of unique neighbors that they have. It is based on the idea that the more connected a node is, the more likely it is to receive new links. A value of 0 indicates that two nodes are not close, while higher values indicate nodes are closer.

MATCH (p1:Node {name: 'A'})
MATCH (p2:Node) WHERE p1 <> p2
RETURN p2.name AS name,
   gds.alpha.linkprediction.totalNeighbors(p1, p2) AS score;
  • A and B = 4
  • A and C = 5
  • A and D = 6
  • A and E = 4
  • A and F = 4

Conclusions

Read all articles from the same Neo4j Graph Algorithms series:

  1. Path Finding Algorithms
  2. Centrality Algorithms
  3. Similarity Algorithms
  4. Community Detection Algorithms
  5. Link Prediction Algorithms
Need a NoSQL Expert?

Certified Solutions Architect in Azure and AWS
Certified Professional in Cassandra, Couchbase, Redis, Neo4j
Experienced in DynamoDB, Cosmos DB, MongoDB

Categories: Graphs

Cristian Scutaru

I designed and implemented the Data Xtractor suite, with Model Xtractor, Query Xtractor, and Visual Xtractor as separate modules. I am a software architect and developer with over 30 years professional experience. I’ve been working with relational databases for almost three decades and I was constantly unhappy with the relative limitation of those tools used to connect directly to a platform, and instantly extract and display data in flexible ways.