Neo4j Graph Algorithms: (4) Community Detection Algorithms

Published by Cristian Scutaru on

Community detection algorithms are used to evaluate how groups of nodes are clustered or partitioned, as well as their tendency to strengthen or break apart. This visual presentation of the Neo4j graph algorithms is focused on quick understanding and less implementation details. With small reusable samples, for less time-consuming labs.

Table of Contents

Introduction

This article presents quickly – in a graphical and descriptive manner, skipping many implementation details – most of the Community Detection algorithms implemented by Neo4j in their Graph Data Science (GDS) library. We focus on one or two small samples reused by most of the algorithms, to keep it simple and allow for less time-consuming labs.

Run the following CREATE query in a blank document. All these Cypher queries can be run quickly in a new Blank Project on the free online Neo4j Sandbox. Or, if you have your own environment, on your Neo4j Desktop.

CREATE (a:Node {name: 'A'}),
   (b:Node {name: 'B'}),
   (c:Node {name: 'C'}),
   (d:Node {name: 'D'}),
   (e:Node {name: 'E'}),
   (f:Node {name: 'F'}),
   (a)-[:REL {weight: 50}]->(b),
   (a)-[:REL {weight: 50}]->(c),
   (a)-[:REL {weight: 100}]->(d),
   (b)-[:REL {weight: 40}]->(d),
   (c)-[:REL {weight: 40}]->(d),
   (c)-[:REL {weight: 80}]->(e),
   (d)-[:REL {weight: 30}]->(e),
   (d)-[:REL {weight: 80}]->(f),
   (e)-[:REL {weight: 40}]->(f);

This creates a simple weighted graph with REL relationships between Node vertices. Call anytime MATCH (n) RETURN n; to get a visual representation of the whole graph. For the relationships, you may show the generic REL type, or the weight value, as below:

1. Louvain Modularity Algorithm

Louvain is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs. It detects communities in large networks, and maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

1.1. Unweighted Graph

CALL gds.louvain.stream({
   nodeProjection: 'Node',
   relationshipProjection: {
      REL: { type: 'REL', orientation: 'UNDIRECTED' }
   }
})
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId;

The previous query detects two communities [A, B, C, D] and [E, F], when the graph is considered undirected and unweighted. Each community is assigned a unique ID, but their values don’t really matter otherwise:

1.2. Weighted Graph

Next query creates a reusable native in-memory projection of a weighted graph, then we call Louvain again, to detect communities:

CALL gds.graph.create('myGraph', 'Node',
    { REL: { orientation: 'UNDIRECTED' } },
    { relationshipProperties: 'weight' }
);

CALL gds.louvain.stream('myGraph', {
   relationshipWeightProperty: 'weight'
})
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId;

The result is this time different, as the nodes are split between two communities [A, B, D, F] and [C, E]. This is because most of the relationships connecting C and E with the rest have lower weight values (50, 40, 30 and 40), compared to the other ones:

2. Modularity Optimization Algorithm

The Modularity Optimization algorithm tries to detect communities in the graph based on their modularity. Modularity is a measure of the structure of a graph, measuring the density of connections within a module or community. Graphs with a high modularity score will have many connections within a community but only few pointing outwards to other communities. The algorithm will explore for every node if its modularity score might increase if it changes its community to one of its neighboring nodes.

2.1. Unweighted Graph

Using Modularity on the previous saved in-memory projection (and ignoring the weight property for now), detects the same [A, B, C, D] and [E, F] communities as Louvain for the unweighted graph:

CALL gds.beta.modularityOptimization.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId;

2.2. Weighted Graph

Using Modularity on the same projection, but based on the weight property, detects the same [A, B, D, F] and [C, E] communities as Louvain for the weighted graph:

CALL gds.beta.modularityOptimization.stream('myGraph', {
   relationshipWeightProperty: 'weight'
})
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId;

3. Triangle Count Algorithm

The Triangle Count algorithm counts the number of triangles for each node in the graph. A triangle is a set of three nodes where each node has a relationship to the other two. In graph theory terminology, this is sometimes referred to as a 3-clique. The Triangle Count algorithm in the GDS library only finds triangles in undirected graphs.

We’ll count the triangles for each node on the previously in-memory saved projection, ignoring the weight property:

CALL gds.triangleCount.stream('myGraph')
YIELD nodeId, triangleCount AS count
RETURN gds.util.asNode(nodeId).name AS name, count;

Previous image shows all existing triangles. Based on this image, here is the triangle count for each node:

  • A, C, E –> 2 triangles
  • B, F –> 1 triangle
  • D –> 4 triangles

4. Local Clustering Coefficient Algorithm

The Local Clustering Coefficient algorithm computes the local clustering coefficient for each node in the graph, using the Triangle Count algorithm.

CALL gds.localClusteringCoefficient.stream('myGraph')
YIELD nodeId, localClusteringCoefficient AS coef
RETURN gds.util.asNode(nodeId).name AS name, coef;

Instead of triangles, a value between 0 and 1 is returned for each node:

  • A, C, E –> 0.67 (2 triangles)
  • B, F –> 1.0 (1 triangle)
  • D –> 0.4 (4 triangles)

5. K-1 Coloring Algorithm

The K-1 Coloring algorithm assigns a color to every node in the graph, using as few colors as possible, and making sure that every neighbor of a given node has a different color than the node itself.

We’ll continue coloring on the in-memory projection, used as a unweighted graph:

CALL gds.beta.k1coloring.stream('myGraph')
YIELD nodeId, color
RETURN gds.util.asNode(nodeId).name AS name, color;

The result return one color number for each node. Translated into a visual representation, here is what we get (remark there are no two directly connected nodes using the same color):

6. Label Propagation Algorithm (LPA)

The Label Propagation algorithm (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide, and doesn’t require a pre-defined objective function or prior information about the communities.

6.1. Preparing New Graph

For the rest of this article, we’ll need a different graph, to be able to group the nodes into different communities. Run the following three queries one by one:

MATCH (n) DETACH DELETE n;

CREATE (alice:User {name: 'Alice'}),
  (bridget:User {name: 'Bridget'}),
  (charles:User {name: 'Charles'}),
  (doug:User {name: 'Doug'}),
  (mark:User {name: 'Mark'}),
  (michael:User {name: 'Michael'}),
  (alice)-[:FOLLOW {weight: 1}]->(bridget),
  (alice)-[:FOLLOW {weight: 10}]->(charles),
  (mark)-[:FOLLOW {weight: 1}]->(doug),
  (bridget)-[:FOLLOW {weight: 1}]->(michael),
  (doug)-[:FOLLOW {weight: 1}]->(mark),
  (michael)-[:FOLLOW {weight: 1}]->(alice),
  (alice)-[:FOLLOW {weight: 1}]->(michael),
  (bridget)-[:FOLLOW {weight: 1}]->(alice),
  (michael)-[:FOLLOW {weight: 1}]->(bridget),
  (charles)-[:FOLLOW {weight: 1}]->(doug);

MATCH (n) RETURN n;

First query removed all previous nodes and relationships. Second query populated your database with a new graph, displayed by the last query similar to the view below (show weight property values instead of the generic FOLLOW relationship type):

6.2. Weighted Graph

It’s time to run the LPA now. Discard the previous myGraph from memory and create a new native projection, based on the new node and relationship types. Then call the Label Propagation algorithm on this new projection:

CALL gds.graph.drop('myGraph');

CALL gds.graph.create('myGraph', 'User', 'FOLLOW',
    { relationshipProperties: 'weight' }
);

CALL gds.labelPropagation.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS name, communityId;

The call discovered two communities: [Doug, Mark, Charles] and [Michael, Alice, Bridget]:

7. Speaker-Listener LPA (SLLPA)

The Speaker-Listener Label Propagation Algorithm (SLLPA) is a variation of the Label Propagation algorithm that is able to detect multiple communities per node.

CALL gds.alpha.sllpa.stream('myGraph', {
   maxIterations: 100,
   minAssociationStrength: 0.1
})
YIELD nodeId, values
RETURN gds.util.asNode(nodeId).name AS name,
   values.communityIds AS communityIds;

Previous SLLPA-based query discovered several nodes in more than one community, as in the image below:

8. Weakly Connected Components (WCC) Algorithm

The WCC algorithm finds sets of connected nodes in an undirected graph, where all nodes in the same set form a connected component. WCC is often used early in an analysis to understand the structure of a graph. Using WCC to understand the graph structure enables running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups.

CALL gds.wcc.stream('myGraph', {
  relationshipWeightProperty: 'weight',
  threshold: 10
}) YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS Name, componentId;

Remark that only Alice and Charles are connected by a relationship with weight 10 (all other relationships have weight 1). This generates one community for both:

9. Strongly Connected Components (SCC) Algorithm

The Strongly Connected Components (SCC) algorithm finds maximal sets of connected nodes in a directed graph. A set is considered a strongly connected component if there is a directed path between each pair of nodes within the set. It is often used early in a graph analysis process to help us get an idea of how our graph is structured.

CALL gds.alpha.scc.stream({
  nodeProjection: 'User',
  relationshipProjection: 'FOLLOW'
})
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS Name, componentId;

The graph has been considered unweighted (weight property didn’t matter), but directed. Mark and Doug are strongly connected through reciprocal links. Just like Alice, Bridget and Michael at the other end. Charles is on his own:

Conclusions

Read all articles from the same Neo4j Graph Algorithms series:

  1. Path Finding Algorithms
  2. Centrality Algorithms
  3. Similarity Algorithms
  4. Community Detection Algorithms
  5. Link Prediction Algorithms
Need a NoSQL Expert?

Certified Solutions Architect in Azure and AWS
Certified Professional in Cassandra, Couchbase, Redis, Neo4j
Experienced in DynamoDB, Cosmos DB, MongoDB

Categories: Graphs

Cristian Scutaru

I designed and implemented the Data Xtractor suite, with Model Xtractor, Query Xtractor, and Visual Xtractor as separate modules. I am a software architect and developer with over 30 years professional experience. I’ve been working with relational databases for almost three decades and I was constantly unhappy with the relative limitation of those tools used to connect directly to a platform, and instantly extract and display data in flexible ways.