Leiden Algorithm
The Leiden algorithm, developed at Leiden University, is an advanced community detection method that addresses key limitations of the Louvain Algorithm. Its primary innovation is guaranteeing that all detected communities are well-connected, eliminating the disconnected community problem that can occur with Louvain.1
Motivation: Louvain’s Limitations
The Louvain algorithm, while fast and effective, can produce poorly connected or even disconnected communities. This occurs because:
- Nodes may be “trapped” in communities after their connecting nodes have moved
- The greedy local moves don’t consider global community structure
- No mechanism ensures internal community connectivity
Example of disconnected community in Louvain:
Before: After Louvain:
A───B───C───D ┌─────────────────┐
│ │ │ │ Community 1 │
E───F───G───H │ A B C D │ ← A,B disconnected
│ │ │ │ │ │ from C,D!
I │ E F G H I │
└─────────────────┘
Algorithm Overview
Leiden introduces a three-phase iterative process:
Phase 1: Local Moving
Similar to Louvain’s first phase:
- Each node starts in its own community
- Nodes move to neighboring communities that maximize modularity gain
- Continue until no improvement is possible
Phase 2: Refinement (Leiden’s Innovation)
After initial assignment, refine community assignments:
- For each community, create a refined partition
- Nodes can only move to communities reachable via edges
- Merge only well-connected node subsets
- Ensures all resulting communities are internally connected
Phase 3: Aggregation
Create the aggregate network:
- Each refined community becomes a super-node
- Aggregate edges between communities
- Maintain the refined partition structure
Iteration
Repeat all three phases on the coarsened network until convergence.
Visual Comparison
LOUVAIN (2 phases):
┌──────────┐ ┌──────────┐
│ Local │───▶│Aggregate │───▶ Repeat
│ Moving │ │ │
└──────────┘ └──────────┘
LEIDEN (3 phases):
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Local │───▶│Refinement│───▶│Aggregate │───▶ Repeat
│ Moving │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘
▲
Guarantees connectivity
The Refinement Phase in Detail
The refinement phase is what makes Leiden superior. It works as follows:
- Consider each community from Phase 1 as a subgraph
- Initialize each node in its own sub-community within the community
- Merge nodes only if they are:
- Directly connected by an edge
- The merge improves the partition quality
- Build connected components within each original community
Refinement Pseudocode
function REFINE(G, partition):
refined_partition = {}
for community in partition.communities:
subgraph = G.subgraph(community.nodes)
// Initialize each node in its own sub-community
sub_partition = {node: {node} for node in community.nodes}
// Only merge connected nodes
for node in random_order(community.nodes):
for neighbor in subgraph.neighbors(node):
if merge_improves_quality(node, neighbor, sub_partition):
merge(node, neighbor, sub_partition)
// Add refined sub-communities to result
refined_partition.add(sub_partition)
return refined_partition
Quality Function: CPM vs Modularity
While Leiden can optimize modularity, it also supports the Constant Potts Model (CPM), which avoids the resolution limit:
Modularity
$$ Q = \sum_c \left[ \frac{e_c}{m} - \gamma \left( \frac{n_c}{2m} \right)^2 \right] $$
CPM (Constant Potts Model)
$$ H = \sum_c \left[ e_c - \gamma \binom{n_c}{2} \right] $$
Where:
- $e_c$ = number of edges inside community $c$
- $n_c$ = number of nodes in community $c$
- $\gamma$ = resolution parameter
CPM advantage: The resolution parameter has a direct interpretation as the expected edge density within communities.
Guaranteed Properties
The Leiden algorithm guarantees:
| Property | Description |
|---|---|
| $\gamma$-connected | All communities are locally optimal w.r.t. resolution $\gamma$ |
| Well-connected | No community can be disconnected by removing a single node |
| Subset optimality | Communities cannot be improved by moving subsets of nodes |
| Convergence | Algorithm converges to a stable partition |
Complexity Analysis
| Aspect | Complexity |
|---|---|
| Time (per iteration) | $O(m)$ |
| Total time (typical) | $O(m \cdot \text{iterations})$ |
| Space | $O(n + m)$ |
| Iterations | Typically 5-10 |
Leiden is slightly slower per iteration than Louvain due to the refinement phase, but often converges faster and produces better results.
Comparison: Louvain vs Leiden
| Aspect | Louvain | Leiden |
|---|---|---|
| Phases | 2 | 3 |
| Connected communities | Not guaranteed | Guaranteed |
| Partition quality | Good | Better |
| Speed | Faster per iteration | Faster convergence |
| Determinism | Non-deterministic | Non-deterministic |
| Implementation complexity | Simpler | More complex |
| Research applications | Established | Recommended |
Implementation
Python with leidenalg
import igraph as ig
import leidenalg
# Create graph
G = ig.Graph.Famous("Zachary")
# Run Leiden with modularity optimization
partition = leidenalg.find_partition(
G,
leidenalg.ModularityVertexPartition
)
print(f"Number of communities: {len(partition)}")
print(f"Modularity: {partition.modularity:.4f}")
# With CPM and custom resolution
partition_cpm = leidenalg.find_partition(
G,
leidenalg.CPMVertexPartition,
resolution_parameter=0.1
)
Python with cdlib
from cdlib import algorithms
import networkx as nx
# Create graph
G = nx.karate_club_graph()
# Run Leiden
communities = algorithms.leiden(G)
# Get community membership
for i, com in enumerate(communities.communities):
print(f"Community {i}: {com}")
Resolution Parameter Exploration
import leidenalg
import igraph as ig
import matplotlib.pyplot as plt
G = ig.Graph.Famous("Zachary")
resolutions = [0.01, 0.05, 0.1, 0.5, 1.0, 2.0]
for res in resolutions:
partition = leidenalg.find_partition(
G,
leidenalg.CPMVertexPartition,
resolution_parameter=res
)
print(f"Resolution {res}: {len(partition)} communities")
Practical Guidelines
When to Use Leiden Over Louvain
- Quality is critical: Research publications, final analyses
- Connectivity matters: Applications where disconnected communities are problematic
- Large networks: Leiden often finds better partitions faster
- Resolution exploration: CPM provides interpretable resolution parameter
Parameter Tuning
| Parameter | Recommended Values | Notes |
|---|---|---|
| Resolution (Modularity) | 0.5 - 2.0 | Higher = smaller communities |
| Resolution (CPM) | 0.001 - 0.1 | Depends on network density |
| Number of iterations | Until convergence | Usually 5-10 |
| Number of runs | 10-100 | For stability analysis |
Validation
- Check connectivity: Verify all communities are connected
- Size distribution: Look for reasonable community sizes
- Modularity: Compare with Louvain baseline
- Stability: Multiple runs should give similar results
Applications
| Domain | Application |
|---|---|
| Single-cell analysis | Cell type clustering in scRNA-seq |
| Social networks | Community structure analysis |
| Citation networks | Research field identification |
| Protein networks | Functional module detection |
| Financial networks | Market segment analysis |
Related Topics
- Community Detection - Overview of community detection methods
- Louvain Algorithm - Predecessor algorithm
- Modularity - Quality function
- Graph Neural Networks - Deep learning on graphs
References
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(1), 5233. https://www.nature.com/articles/s41598-019-41695-z ↩︎