Iterative Correcting Weighted Gene Co-expression Network Analysis

Iterative Correcting Weighted Gene Co-expression Network Analysis function constructing a network from an expression matrix.

Usage

icwgcna(
  ex,
  expo = 6,
  Method = c("pearson", "spearman"),
  q = 0.3,
  maxIt = 10,
  maxComm = 100,
  corCut = 0.8,
  covCut = 0.33,
  mat_mult_method = c("Rfast", "RcppEigen")
)

Arguments

ex: matrix of bulk RNA-seq or microarray gene expression data. This should be in log space and greater than 0.
expo: exponent to use for soft thresholding. If NULL will use angular distance
Method: correlation to use for distance measure, "pearson" (default) or "spearman"
q: quantile (0-1) for first round filtering based on mean expression and standard deviation
maxIt: maximum number of iterations must be 25 or less
maxComm: maximum number of communities to be found
corCut: correlation threshold used for dropping communities
covCut: coefficient of variation (CoV) quantile threshold to use at each iteration for selecting genes to build network. covCut = .667 would use the top third of genes based on CoV after regressing out largest community
mat_mult_method: method for large matrix multiplication, "Rfast" (default) or "RcppEigen" (see Details)

Value

Returns a list with the following items:

community_membership - community membership score (kME). Analogous to loadings in PCA.
community_signature - community eigengene, the first principal component of the expression of genes in this community (with proper direction). This can be thought of as the average of the scaled expression of top community genes.
.community_membership - full community membership score (for exploratory purposes)
.community_signature - full community eigengene (for exploratory purposes)
controlled_for - The communities whose signatures were regressed out at each iteration.

Details

Iterative Correcting Weighted Gene Co-expression Network Analysis function for constructing a gene network from a gene expression matrix. The algorithm:

Constructs a signed wgcna network
Drops correlated modules based on kurtosis.
Regresses out the largest community from the expression data.
Repeats steps 1-3 until a maximum number of communities or iterations is reached.

Some differences from standard WGNCA (Horvath/Langfelder)

Makes heavy use of Rfast to compute adjacencies and TOM to enable iterative network creation on > 20K features.
Uses signed adjacency in order to avoid possible distortions of community signatures (eigengenes).
Iteratively regresses out strongest community in order to facilitate discovery of communities possibly obscured larger module(s).
Clustering does not focus on merging communities but dropping to identify strongest module(s).
Enables Spearman correlation for constructing adjacency matrix instead of Pearson to enable robust application in RNA-seq and micro-array data. Future updates may include mutual information

For matrix multiplication the option "Rfast" will use Rfast::mat.mult(), which takes advantage of parallel processing across multiple cores. The option "RcppEigen" will use the RcppEigen engine for C++ code, which tends to be faster when using a single core, but does not take advantage of parallel processing across multiple cores. If running this on a cluster with access to many computer core there is a significant performance advantage to using Rfast::mat.mult()

Note, the uncorrected_community_signature matrix is useful when comparing to signature matrices from new datasets that were computed with compute compute_eigengene_matrix(). The community signatures in the uncorrected_community_signature matrix may show a high level of colinearity and we strong recommend the use of tree based learners for any analysis based on them.

References

Langfelder P, Horvath S (2008). “WGCNA: an R package for weighted correlation network analysis.” BMC Bioinformatics, 559. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559.

Langfelder P, Horvath S (2012). “Fast R Functions for Robust Correlations and Hierarchical Clustering.” Journal of Statistical Software, 46(11), 1–17. https://www.jstatsoft.org/v46/i11/.

Zhang, Bin and Horvath, Steve. "A General Framework for Weighted Gene Co-Expression Network Analysis" Statistical Applications in Genetics and Molecular Biology, vol. 4, no. 1, 2005. https://doi.org/10.2202/1544-6115.1128

Mason, M.J., Fan, G., Plath, K. et al. Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genomics 10, 327 (2009). https://doi.org/10.1186/1471-2164-10-327#'

Examples

if (FALSE) { # \dontrun{
library("UCSCXenaTools")
luad <- getTCGAdata(
  project = "LUAD", mRNASeq = TRUE, mRNASeqType = "normalized",
  clinical = FALSE, download = TRUE
)
ex <- as.matrix(data.table::fread(luad$destfiles), rownames = 1)

results <- icwgcna(ex)
} # }