Plot pathways and genes contained in them as a network

plot_gs2gene(
  normalisedScores,
  gsTopology,
  geneFC = NULL,
  mapEntrezID = NULL,
  colorGsBy = NULL,
  foldGSname = TRUE,
  foldafter = 2,
  filterGeneBy = 2,
  layout = c("fr", "dh", "gem", "graphopt", "kk", "lgl", "mds", "sugiyama"),
  edgeColor = "darkgrey",
  edgeAlpha = 0.8,
  edgeArc = 0.5,
  geneNodeSize = 3,
  geneNodeShape = 17,
  geneNameFace = c("italic", "plain", "bold", "bold-italic"),
  geneNameColor = "grey30",
  geneNameSize = 3,
  labelGene = TRUE,
  gsNodeSize = 2,
  gsNodeShape = 21,
  gsNodeStroke = 0.5,
  gsNodeOutline = "white",
  gsNameSize = 6,
  gsNameColor = "black",
  geneLegTitle = "Mean logFC",
  gsLegTitle = colorGsBy,
  maxOverlaps = 10,
  ...
)

Arguments

normalisedScores: A data.frame derived from the normalise_by_permu() function. Only gene-sets of interest should be included
gsTopology: List of pathway topology matrices generated using function retrieve_topology()
geneFC: An optional named vector of pathways' fold changes
mapEntrezID: Optional. A data.frame matching genes' entrez IDs to another identifier with preferred labels. Must contain the columns: "entrezid" and "mapTo"
colorGsBy: Column within normalisedScores to color gene-set/pathway nodes by
foldGSname: logical. Should long gene-set names be folded into two lines
foldafter: The number of words after which gene-set names should be folded.
filterGeneBy: Filtration cut-off applied to genes' connectivity (ie. how many pathways was a gene involved in).
layout: The layout algorithm to apply. Accepts all layout supported by igraph.
edgeColor, edgeAlpha: Color and transparency of edges
edgeArc: The bend of edges. 1 approximates a semi-circle whilst 0 will give a straight line.
geneNodeSize, geneNodeShape: Size and shape for gene nodes
geneNameSize, geneNameColor, geneNameFace: Size, color and fontface to use for gene labels
labelGene: logical(1) Should the gene names be included
gsNodeSize: Size for gene-set/pathway nodes
gsNodeShape: Shape for gene-set/pathway nodes. Should be a shape with a fill parameter, such as 21:25
gsNodeStroke, gsNodeOutline: Border thickness and color for gene-set/pathway nodes
gsNameSize, gsNameColor: Size and color of gene-set/pathway labels
geneLegTitle: character(1). Legend title for gene nodes
gsLegTitle: character(1) Legend title for gene-set/pathway nodes
maxOverlaps: passed to geom_node_text
...: Not used

Value

A ggplot2 object

Details

Taking the perturbation scores of a list of gene-sets derived from normalise_by_permu() as input, this function matches gene-sets to their associated genes by utilizing information from pathway topology matrices.

If providing logFC values as a named vector, the names must be entrezgene IDs in the format of "ENTREZID:XXXX" for compatibility with the values returned by retrieve_topology(). If not providing this vector, only genes associated with two or more pathways will be added to the plot, however, it should be noted that if omitting this vector, network plots can easily become unmanageable.

Users can also choose to provide a mapEntrezID data.frame to match entrezgene IDs to their chosen identifiers. The data.frame should contain the columns: "entrezid" and "mapTo".

If geneFC is provided, gene nodes will be colored by values provided, otherwise all gene nodes will drawn in grey.

Since some gene-sets could can contain hundreds of genes, it is not recommended to plot all genes. If mapEntrezID data.frame is provided, only genes included in that data.frame will be used in the plot.

It is strongly recommended to filter genes using some criteria, such as those with the largest magnitude of change. If all pathway genes are desired, please consider setting labelGene to FALSE to remove gene names.

Examples


load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))

# Subset pathways significantly perturbed in sample R5020_N2_48
subset <- dplyr::filter(normalisedScores, adjPvalue < 0.05, sample == "R5020_N2_48")
subset$response <- ifelse(subset$robustZ > 0, "Activated", "Inhibited")

# Color gene-sets nodes by robust z-scores.
plot_gs2gene(
  subset, gsTopology, colorGsBy = "robustZ", labelGene = FALSE, geneNodeSize = 1,
  gsNodeSize = 4
) + scale_fill_gradient2()
#> Gene fold-changes were not provided as a named vector. All genes will be colored identically.

# When fold-changes are not provided, gene nodes are colored grey.

# To color genes by their direction of change, firstly compute single-sample logFC
data(logCPM_example)
data(metadata_example)
metadata_example <- dplyr::mutate(metadata_example, treatment = factor(
   treatment, levels = c("Vehicle", "E2+R5020", "R5020")))
ls <- weight_ss_fc(
  logCPM_example, metadata = metadata_example,
  groupBy = "patient", treatColumn = "treatment",
  sampleColumn = "sample"
)
# Provide fold-changes of sample R5020_N2_48 as a named vector
plot_gs2gene(
  subset, gsTopology, geneFC = ls$logFC[,"R5020_N2_48"],
  colorGsBy = "response", labelGene = FALSE
) + scale_colour_gradient2()


# By default, the function only include genes involved in at least 2 pathways,
# which can be overwritten by the `filterGeneBy` parameter. But there are still
# a large number of genes, making the plot cumbersome. Instead, only include
# fold-changes of genes within the top 500 absolute values for fold-change
top500 <- rank(1/abs(ls$logFC[,"R5020_N2_48"])) <= 500
fcByDir <- ifelse(ls$logFC[top500,"R5020_N2_48"] > 0, "Up-Regulated", "Down-Regulated")
plot_gs2gene(subset, gsTopology, geneFC = fcByDir, colorGsBy = "response") +
  scale_fill_manual(values = c("darkred", "lightskyblue")) +
  scale_colour_manual(values = c("red", "blue"))


# To make the gene labels more informative, map genes' entrez id to chosen identifiers.
load(system.file("extdata", "entrez2name.rda", package = "sSNAPPY"))
plot_gs2gene(
  subset, gsTopology, geneFC = fcByDir, mapEntrezID = entrez2name,
  colorGsBy = "response", gsNodeSize = 4
) +
  scale_fill_manual(values = c("darkred", "lightskyblue"), name = "Pathway") +
  scale_colour_manual(values = c("blue", "red"), name = "Gene\nDirection")