Leiden Algorithm

The Leiden algorithm, developed at Leiden University, is an advanced community detection method that addresses key limitations of the Louvain Algorithm. Its primary innovation is guaranteeing that all detected communities are well-connected, eliminating the disconnected community problem that can occur with Louvain.1


Motivation: Louvain’s Limitations

The Louvain algorithm, while fast and effective, can produce poorly connected or even disconnected communities. This occurs because:

  1. Nodes may be “trapped” in communities after their connecting nodes have moved
  2. The greedy local moves don’t consider global community structure
  3. No mechanism ensures internal community connectivity

Example of disconnected community in Louvain:

Before:                      After Louvain:
    A───B───C───D           ┌─────────────────┐
    │       │   │           │ Community 1     │
    E───F───G───H           │ A B   C D       │  ← A,B disconnected
        │                   │   │   │ │       │    from C,D!
        I                   │   E F G H I     │
                            └─────────────────┘

Algorithm Overview

Leiden introduces a three-phase iterative process:

Phase 1: Local Moving

Similar to Louvain’s first phase:

  1. Each node starts in its own community
  2. Nodes move to neighboring communities that maximize modularity gain
  3. Continue until no improvement is possible

Phase 2: Refinement (Leiden’s Innovation)

After initial assignment, refine community assignments:

  1. For each community, create a refined partition
  2. Nodes can only move to communities reachable via edges
  3. Merge only well-connected node subsets
  4. Ensures all resulting communities are internally connected

Phase 3: Aggregation

Create the aggregate network:

  1. Each refined community becomes a super-node
  2. Aggregate edges between communities
  3. Maintain the refined partition structure

Iteration

Repeat all three phases on the coarsened network until convergence.


Visual Comparison

LOUVAIN (2 phases):
┌──────────┐    ┌──────────┐
│  Local   │───▶│Aggregate │───▶ Repeat
│  Moving  │    │          │
└──────────┘    └──────────┘

LEIDEN (3 phases):
┌──────────┐    ┌──────────┐    ┌──────────┐
│  Local   │───▶│Refinement│───▶│Aggregate │───▶ Repeat
│  Moving  │    │          │    │          │
└──────────┘    └──────────┘    └──────────┘
              Guarantees connectivity

The Refinement Phase in Detail

The refinement phase is what makes Leiden superior. It works as follows:

  1. Consider each community from Phase 1 as a subgraph
  2. Initialize each node in its own sub-community within the community
  3. Merge nodes only if they are:
    • Directly connected by an edge
    • The merge improves the partition quality
  4. Build connected components within each original community

Refinement Pseudocode

function REFINE(G, partition):
    refined_partition = {}
    
    for community in partition.communities:
        subgraph = G.subgraph(community.nodes)
        
        // Initialize each node in its own sub-community
        sub_partition = {node: {node} for node in community.nodes}
        
        // Only merge connected nodes
        for node in random_order(community.nodes):
            for neighbor in subgraph.neighbors(node):
                if merge_improves_quality(node, neighbor, sub_partition):
                    merge(node, neighbor, sub_partition)
        
        // Add refined sub-communities to result
        refined_partition.add(sub_partition)
    
    return refined_partition

Quality Function: CPM vs Modularity

While Leiden can optimize modularity, it also supports the Constant Potts Model (CPM), which avoids the resolution limit:

Modularity

$$ Q = \sum_c \left[ \frac{e_c}{m} - \gamma \left( \frac{n_c}{2m} \right)^2 \right] $$

CPM (Constant Potts Model)

$$ H = \sum_c \left[ e_c - \gamma \binom{n_c}{2} \right] $$

Where:

  • $e_c$ = number of edges inside community $c$
  • $n_c$ = number of nodes in community $c$
  • $\gamma$ = resolution parameter

CPM advantage: The resolution parameter has a direct interpretation as the expected edge density within communities.


Guaranteed Properties

The Leiden algorithm guarantees:

PropertyDescription
$\gamma$-connectedAll communities are locally optimal w.r.t. resolution $\gamma$
Well-connectedNo community can be disconnected by removing a single node
Subset optimalityCommunities cannot be improved by moving subsets of nodes
ConvergenceAlgorithm converges to a stable partition

Complexity Analysis

AspectComplexity
Time (per iteration)$O(m)$
Total time (typical)$O(m \cdot \text{iterations})$
Space$O(n + m)$
IterationsTypically 5-10

Leiden is slightly slower per iteration than Louvain due to the refinement phase, but often converges faster and produces better results.


Comparison: Louvain vs Leiden

AspectLouvainLeiden
Phases23
Connected communitiesNot guaranteedGuaranteed
Partition qualityGoodBetter
SpeedFaster per iterationFaster convergence
DeterminismNon-deterministicNon-deterministic
Implementation complexitySimplerMore complex
Research applicationsEstablishedRecommended

Implementation

Python with leidenalg

import igraph as ig
import leidenalg

# Create graph
G = ig.Graph.Famous("Zachary")

# Run Leiden with modularity optimization
partition = leidenalg.find_partition(
    G, 
    leidenalg.ModularityVertexPartition
)

print(f"Number of communities: {len(partition)}")
print(f"Modularity: {partition.modularity:.4f}")

# With CPM and custom resolution
partition_cpm = leidenalg.find_partition(
    G,
    leidenalg.CPMVertexPartition,
    resolution_parameter=0.1
)

Python with cdlib

from cdlib import algorithms
import networkx as nx

# Create graph
G = nx.karate_club_graph()

# Run Leiden
communities = algorithms.leiden(G)

# Get community membership
for i, com in enumerate(communities.communities):
    print(f"Community {i}: {com}")

Resolution Parameter Exploration

import leidenalg
import igraph as ig
import matplotlib.pyplot as plt

G = ig.Graph.Famous("Zachary")
resolutions = [0.01, 0.05, 0.1, 0.5, 1.0, 2.0]

for res in resolutions:
    partition = leidenalg.find_partition(
        G,
        leidenalg.CPMVertexPartition,
        resolution_parameter=res
    )
    print(f"Resolution {res}: {len(partition)} communities")

Practical Guidelines

When to Use Leiden Over Louvain

  1. Quality is critical: Research publications, final analyses
  2. Connectivity matters: Applications where disconnected communities are problematic
  3. Large networks: Leiden often finds better partitions faster
  4. Resolution exploration: CPM provides interpretable resolution parameter

Parameter Tuning

ParameterRecommended ValuesNotes
Resolution (Modularity)0.5 - 2.0Higher = smaller communities
Resolution (CPM)0.001 - 0.1Depends on network density
Number of iterationsUntil convergenceUsually 5-10
Number of runs10-100For stability analysis

Validation

  1. Check connectivity: Verify all communities are connected
  2. Size distribution: Look for reasonable community sizes
  3. Modularity: Compare with Louvain baseline
  4. Stability: Multiple runs should give similar results

Applications

DomainApplication
Single-cell analysisCell type clustering in scRNA-seq
Social networksCommunity structure analysis
Citation networksResearch field identification
Protein networksFunctional module detection
Financial networksMarket segment analysis

  • Community Detection - Overview of community detection methods
  • Louvain Algorithm - Predecessor algorithm
  • Modularity - Quality function
  • Graph Neural Networks - Deep learning on graphs

References


  1. Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(1), 5233. https://www.nature.com/articles/s41598-019-41695-z ↩︎