Skip to contents

Applies network diffusion techniques to propagate influence from a set of seed nodes across a graph. Supports Laplacian smoothing, heat diffusion, and random walk with restart (RWR).

Usage

network_diffusion_with_pvalues(
  graph,
  seed_nodes,
  method = c("laplacian", "heat", "rwr"),
  alpha = 0.7,
  t = 1,
  restart_prob = 0.3,
  normalize = TRUE,
  n_permutations = 1000,
  seed = NULL,
  verbose = TRUE
)

Arguments

graph

An igraph object representing the network to analyze or a data frame containing a symbolic edge list in the first two columns. Additional columns are considered as edge attributes. Must have named vertices.

seed_nodes

Character vector of seed node names (must match V(graph)$name).

method

Character. Diffusion method to use:

  • "laplacian": Solves the linear system \((I + \alpha L)^{-1} f_0\), where \(L\) is the (normalized) graph Laplacian and \(\alpha\) is a smoothing parameter. Internally, a sparse Cholesky decomposition is used for efficiency.

  • "heat": Applies the heat diffusion model \(e^{-tL} f_0\), where \(t\) controls diffusion time. If available, a sparse approximation method is used to avoid dense matrix exponential.

  • "rwr": Random Walk with Restart. Iteratively solves \(f = (1 - r)Pf + r f_0\), where \(P\) is the transition matrix and \(r\) is the restart probability.

alpha

Damping factor for the Laplacian method. Default is 0.7.

t

Time parameter for the heat diffusion method. Default is 1.

restart_prob

Restart probability (usually between 0.3 and 0.7) for the RWR method. Default is 0.3.

normalize

Logical. Whether to normalize the adjacency matrix (symmetric normalization for undirected graphs). Default is TRUE.

n_permutations

Integer. Number of permutations to run for empirical p-value estimation (default 1000).

seed

Optional integer for reproducible random number generation. If NULL (default), seed is not set.

verbose

Logical. If TRUE (default), displays a progress bar during permutations.

Value

A data frame with two columns:

node

Node name

score

Diffusion score representing influence from the seed nodes

Details

This function allows flexible application of network diffusion strategies, useful in systems biology (e.g., gene prioritization, pathway propagation), network analysis, and disease gene discovery. The underlying matrix operations are based on well-established diffusion models from graph theory.

For "rwr" (random walk with restart), the algorithm iteratively propagates scores until convergence based on a row-normalized transition matrix. Recommended method for large networks.

For "laplacian" and "heat", the graph Laplacian is computed from the (optionally normalized) adjacency matrix.

References

Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008;82(4):949–958. doi:10.1016/j.ajhg.2008.02.013

Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6(1):e1000641. doi:10.1371/journal.pcbi.1000641

Examples

if (FALSE) { # \dontrun{
g <- sample_gnp(100, 0.05, directed = F)
V(g)$name <- as.character(seq_len(vcount(g)))
seed_nodes <- sample(V(g)$name, 5)
network_diffusion_with_pvalues(g, seed_npodes, method = "laplacian")
} # }