Compute weighted single sample LogFCs from normalised logCPM

Compute weighted single sample logFCs for each treated samples using normalized logCPM values. Fit a lowess curve on variances ~ mean of logCPM values, and use it to predict gene-wise weights. The weighted single sample logFCs are ready to be used for computing perturbation scores.

weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)

# S4 method for matrix
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)

# S4 method for data.frame
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)

# S4 method for DGEList
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)

# S4 method for SummarizedExperiment
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)

Arguments

expreMatrix: matrix or data.frame of logCPM, or DGEList/ SummarizedExperiment storing gene expression counts and sample metadata. Feature names need to be in entrez IDs, and column names need to be sample names
metadata: Sample metadata data.frame as described in the details section.
sampleColumn: Name of the column in the metadata containing column names of the expreMatrix
treatColumn: Name of the column in the metadata containing treatment information. The column must be a factor with the reference level set to be the control treatment.
groupBy: Name of the column in the metadata containing information for how samples are matched in pairs (eg. patient).

Value

A list with two elements: $weight gene-wise weights; $logFC weighted single sample logFC matrix

Details

This function computes weighted single-sample logFCs from normalised logCPM values, used for computing single-sample perturbation scores.

Since genes with smaller logCPM turn to have larger variances among single sample logFCs.A lowess curve will be fitted to estimate the relationship between variances and mean of logCPM, and the relationship will be used to estimate the variance of each mean logCPM value. Gene-wise weights, which are defined to be inverse of variances, will then be multiplied to single-sample logFCs to down-weight genes with low counts.

It is assumed that the genes with extremely low counts have been removed and the count matrix has been normalised prior to the logCPM matrix was derived. Row names of the matrix must be in genes' entrez IDs.

If a S4 object of DGEList or SummarizedExperiment is provided as input to expreMatrix, the gene expression matrix will be extracted from it and converted to a logCPM matrix. Sample metadata will also be extracted from the same S4 object unless otherwise specified.

Provided sample metadata should have the same number of rows as the number of columns in the logCPM matrix and must contain the a column for treatment, one for sample names and a column for how samples should be matched into pairs.

Examples

# Inspect metadata data frame to make sure it has treatment, sample
# and patient columns
data(metadata_example)
data(logCPM_example)
# Set the treatment column to be a factor where the reference is the control
#treatment
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 sampleColumn = "sample", groupBy = "patient", treatColumn = "treatment")