Greedy Seed Node Selection to Maximize Diffusion Toward Target Nodes
greedy_seed_selection.Rd
This function implements a greedy algorithm to select a set of k
seed nodes from a candidate list
such that the resulting diffusion signal on a specified set of target nodes is maximized.
It supports multiple diffusion models (Laplacian, Heat Kernel, Random Walk with Restart).
Usage
greedy_seed_selection(
graph,
target_nodes,
candidate_nodes = NULL,
method = c("laplacian", "heat", "rwr"),
alpha = 0.7,
t = 1,
restart_prob = 0.3,
k = 5,
normalize = TRUE,
plot = TRUE
)
Arguments
- graph
An
igraph
object representing the network to analyze or a data frame containing a symbolic edge list in the first two columns. Additional columns are considered as edge attributes.- target_nodes
A character vector of node names to prioritize for receiving the diffusion signal.
- candidate_nodes
Optional character vector of eligible nodes to consider as seeds. If NULL (default), all non-target nodes are used.
- method
Diffusion method to use. One of
"laplacian"
,"heat"
, or"rwr"
.- alpha
Scaling parameter for the Laplacian diffusion method (default = 0.7).
- t
Time parameter for the Heat Kernel diffusion (default = 1).
- restart_prob
Restart probability for the Random Walk with Restart (default = 0.3).
- k
Number of seed nodes to select (default = 5).
- normalize
Logical; whether to normalize the cumulative diffusion score over the target nodes (default = TRUE).
- plot
Logical; whether to plot the progression of target diffusion score as seeds are selected (default = TRUE).
Value
A list with the following elements:
- selected
Character vector of selected seed node names.
- final_target_score
Final cumulative (or normalized) diffusion score over the target nodes.
- scores_at_each_step
Vector of target scores at each greedy selection step.
Details
This function uses a stepwise greedy heuristic. At each step, it evaluates the marginal gain in target score from
adding each candidate node to the current seed set, and selects the one with the highest gain. This is repeated for k
steps.
Note: The score of each candidate at every step is recomputed via a full diffusion run, making the function computationally
intensive for large graphs or large k
.
The target score can be interpreted as either the total or average diffusion signal received by the target nodes. Normalization helps scale results across networks of different sizes.
Examples
if (FALSE) { # \dontrun{
g <- sample_gnp(50, 0.05, directed = F)
target <- c("1", "2", "3")
greedy_seed_selection(g, target_nodes = target, k = 10)
} # }