R/compute_ssFC.R
weight_ss_fc.Rd
Compute weighted single sample logFCs for each treated samples using normalized logCPM values. Fit a lowess curve on variances ~ mean of logCPM values, and use it to predict gene-wise weights. The weighted single sample logFCs are ready to be used for computing perturbation scores.
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)
# S4 method for matrix
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)
# S4 method for data.frame
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)
# S4 method for DGEList
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)
# S4 method for SummarizedExperiment
weight_ss_fc(expreMatrix, metadata = NULL, sampleColumn, treatColumn, groupBy)
matrix
or data.frame
of logCPM, or DGEList
/
SummarizedExperiment
storing gene expression counts and sample metadata.
Feature names need to be in entrez IDs, and column names need to be sample names
Sample metadata data.frame
as described in the details section.
Name of the column in the metadata
containing column
names of the expreMatrix
Name of the column in the metadata
containing treatment
information. The column must be a factor with the reference level set to be
the control treatment.
Name of the column in the metadata
containing information
for how samples are matched in pairs (eg. patient).
A list with two elements: $weight gene-wise weights; $logFC weighted single sample logFC matrix
This function computes weighted single-sample logFCs from normalised logCPM values, used for computing single-sample perturbation scores.
Since genes with smaller logCPM turn to have larger variances among single sample logFCs.A lowess curve will be fitted to estimate the relationship between variances and mean of logCPM, and the relationship will be used to estimate the variance of each mean logCPM value. Gene-wise weights, which are defined to be inverse of variances, will then be multiplied to single-sample logFCs to down-weight genes with low counts.
It is assumed that the genes with extremely low counts have been removed and the count matrix has been normalised prior to the logCPM matrix was derived. Row names of the matrix must be in genes' entrez IDs.
If a S4 object of DGEList or SummarizedExperiment
is provided as input
to expreMatrix
, the gene expression matrix will be extracted from it and
converted to a logCPM matrix. Sample metadata will also be extracted from the
same S4 object unless otherwise specified.
Provided sample metadata should have the same number of rows as the number of columns in the logCPM matrix and must contain the a column for treatment, one for sample names and a column for how samples should be matched into pairs.
# Inspect metadata data frame to make sure it has treatment, sample
# and patient columns
data(metadata_example)
data(logCPM_example)
# Set the treatment column to be a factor where the reference is the control
#treatment
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
sampleColumn = "sample", groupBy = "patient", treatColumn = "treatment")