Greedy Seed Node Selection to Maximize Diffusion Toward Target Nodes

This function implements a greedy algorithm to select a set of k seed nodes from a candidate list such that the resulting diffusion signal on a specified set of target nodes is maximized. It supports multiple diffusion models (Laplacian, Heat Kernel, Random Walk with Restart).

Usage

greedy_seed_selection(
  graph,
  target_nodes,
  candidate_nodes = NULL,
  method = c("laplacian", "heat", "rwr"),
  alpha = 0.7,
  t = 1,
  restart_prob = 0.3,
  k = 5,
  normalize = TRUE,
  plot = TRUE
)

Arguments

graph: An igraph object representing the network to analyze or a data frame containing a symbolic edge list in the first two columns. Additional columns are considered as edge attributes.
target_nodes: A character vector of node names to prioritize for receiving the diffusion signal.
candidate_nodes: Optional character vector of eligible nodes to consider as seeds. If NULL (default), all non-target nodes are used.
method: Diffusion method to use. One of "laplacian", "heat", or "rwr".
alpha: Scaling parameter for the Laplacian diffusion method (default = 0.7).
t: Time parameter for the Heat Kernel diffusion (default = 1).
restart_prob: Restart probability for the Random Walk with Restart (default = 0.3).
k: Number of seed nodes to select (default = 5).
normalize: Logical; whether to normalize the cumulative diffusion score over the target nodes (default = TRUE).
plot: Logical; whether to plot the progression of target diffusion score as seeds are selected (default = TRUE).

Value

A list with the following elements:

selected: Character vector of selected seed node names.
final_target_score: Final cumulative (or normalized) diffusion score over the target nodes.
scores_at_each_step: Vector of target scores at each greedy selection step.

Details

This function uses a stepwise greedy heuristic. At each step, it evaluates the marginal gain in target score from adding each candidate node to the current seed set, and selects the one with the highest gain. This is repeated for k steps.

Note: The score of each candidate at every step is recomputed via a full diffusion run, making the function computationally intensive for large graphs or large k.

The target score can be interpreted as either the total or average diffusion signal received by the target nodes. Normalization helps scale results across networks of different sizes.

Examples

if (FALSE) { # \dontrun{
g <- sample_gnp(50, 0.05, directed = F)
target <- c("1", "2", "3")
greedy_seed_selection(g, target_nodes = target, k = 10)
} # }