Neo4j Graph Algorithms: (2) Centrality Algorithms

Published by Cristian Scutaru on

Centrality algorithms are used to determine the importance of distinct nodes in a network. This visual presentation of the Neo4j graph algorithms is focused on quick understanding and less implementation details. With small reusable samples, for less time-consuming labs.

Table of Contents

Introduction

This article presents quickly – in a graphical and descriptive manner, skipping many implementation details – most of the Centrality algorithms implemented by Neo4j in their Graph Data Science (GDS) library. We focus on one or two small samples reused by most of the algorithms, to keep it simple and allow for less time-consuming labs.

Run separately the following two queries in a blank document. All these Cypher queries can be run quickly in a new Blank Project on the free online Neo4j Sandbox. Or, if you have your own environment, on your Neo4j Desktop.

CREATE (h:Page {name:'Home'}),
   (p:Page {name:'Products'}),
   (s:Page {name:'Support'}),
   (a:Page {name:'About'}),
   (c:Page {name:'Contact'}),
   (a)-[:LINKS]->(c),
   (s)-[:LINKS]->(a),
   (p)-[:LINKS]->(a),
   (p)-[:LINKS]->(s),
   (h)-[:LINKS]->(s),
   (h)-[:LINKS]->(p);

CALL gds.graph.create('myGraph', 'Page', 'LINKS');

Call anytime MATCH (n) RETURN n; to get a visual representation of the whole graph.

The new graph has only a few unweighted LINKS relationships between Page nodes.

Centrality algorithms determine importance of distinct nodes, based on their directed relationships. Most of the time it’s about unweighted graphs like the one before. A set of new specific node properties will be generated by the following algorithms:

1. PageRank Algorithm

Google’s PageRank algorithm measures the importance of each node within the graph, based on the number incoming relationships and the importance of the corresponding source nodes. The underlying assumption roughly speaking is that a page is only as important as the pages that link to it.

CALL gds.pageRank.write('myGraph', { writeProperty: 'pageRank' });

Check the new pageRank node property values, generated by the previous query. A page like Support has a higher rank than Products, because both Products and Home links to it. One single page links to Contacts, but this page has the highest rank because all other pages link to it, or to another connected page.

2. ArticleRank Algorithm

ArticleRank is a variant of the Page Rank algorithm, which measures the transitive influence or connectivity of nodes.

CALL gds.alpha.articleRank.write({
   nodeProjection: 'Page',
   relationshipProjection: 'LINKS',
   writeProperty: "articleRank"
}) YIELD nodes;

Check the new articleRank node property values generated by the previous query. A page like Contact has a much lower articleRank value than the pageRank. That’s because Page Rank assumes that relationships from nodes that have a low out-degree are more important than relationships from nodes with a higher out-degree, while ArticleRank weakens this assumption.

3. Betweenness Centrality Algorithm

Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another.

CALL gds.betweenness.write('myGraph', { writeProperty: 'scoreBC' });

The generated scoreBC property values are 0 for Home and Contact, which are like extremities of the graph. The nodes inside get non-zero values, with the About page the highest (3). This is because About is considered here the most important node in terms of connecting or bridging all other nodes.

4. Degree Centrality Algorithm

Degree centrality measures the number of incoming and outgoing relationships from a node. The Degree Centrality algorithm can help us find popular nodes in a graph.

CALL gds.alpha.degree.write({
   nodeProjection: 'Page',
   relationshipProjection: 'LINKS',
   writeProperty: 'followersDC'
});

The previous query generates a followersDC node property with the number of outgoing relationships from each node. You may find this way which pages follow most other pages (Home and Products, with 2 outgoing links each).

5. Closeness Centrality Algorithm

Closeness centrality is a way of detecting nodes that are able to spread information very efficiently through a graph. The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.

CALL gds.alpha.closeness.write({
   nodeProjection: 'Page',
   relationshipProjection: 'LINKS',
   writeProperty: 'weightCC'
});

The new weightCC node property values have been generated by this algorithm. Remark how the highest value (0.8) is attributed to the central nodes (Products, Support, About) – that’s because they are able to spread information the most efficiently through the graph.

6. Harmonic Centrality Algorithm

Harmonic Centrality is just a Closeness Centrality variant for disconnected graphs, and will not be discussed here.

7. Eigenvector Centrality Algorithm

Eigenvector Centrality is an algorithm that measures the transitive influence or connectivity of nodes. Relationships to high-scoring nodes contribute more to the score of a node than connections to low-scoring nodes. A high score means that a node is connected to other nodes that have high scores.

CALL gds.alpha.eigenvector.write({
   nodeProjection: 'Page',
   relationshipProjection: 'LINKS',
   writeProperty: 'scoreEC'
});

The new scoreEC property values gradually raise from left to right (from Home to Contact), because of the transitivity in the connected nodes.

8. Hyperlink-Induced Topic Search (HITS) Algorithm

HITS is a link analysis algorithm that rates nodes based on two scores, a hub score and an authority score. The authority score estimates the importance of the node within the network. The hub score estimates the value of its relationships to other nodes.

CALL gds.alpha.hits.write('myGraph', {
   hitsIterations: 20,
   authProperty: 'authHITS',
   hubProperty: 'hubHITS'
});

The default auth and hub generated property names have been renamed here as authHITS and hubHITS. authHITS reflects the transitive page authority, higher when connected in a transitive manner from most links (like Contacts). While hubHITS values are higher for the central nodes.

Conclusions

Read all articles from the same Neo4j Graph Algorithms series:

  1. Path Finding Algorithms
  2. Centrality Algorithms
  3. Similarity Algorithms
  4. Community Detection Algorithms
  5. Link Prediction Algorithms
Need a NoSQL Expert?

Certified Solutions Architect in Azure and AWS
Certified Professional in Cassandra, Couchbase, Redis, Neo4j
Experienced in DynamoDB, Cosmos DB, MongoDB

Categories: Graphs

Cristian Scutaru

I designed and implemented the Data Xtractor suite, with Model Xtractor, Query Xtractor, and Visual Xtractor as separate modules. I am a software architect and developer with over 30 years professional experience. I’ve been working with relational databases for almost three decades and I was constantly unhappy with the relative limitation of those tools used to connect directly to a platform, and instantly extract and display data in flexible ways.