Imagine a bustling city (your tissue sample). Each building is a cell with a unique function. scRNA-seq lets us eavesdrop on the conversations (gene expression) happening in each building.
Normalization is like adjusting the volume in each building.
Normalization is like adjusting the volume in each building.
NormalizeData(
object,
normalization.method = "LogNormalize",
scale.factor = 10000
)
The function RunPCA will perform the PCA dimension reduction method to the object. But the user must be careful about the number of features that will be used. Most of the time, we can use only the most variable features (why?) and reduce the computational burden significantly. It is also good practice to overwrite the object with the PCA result to centralize the information.
object = RunPCA(
object,
features
)
We can use the DimPlot function to visualize the principal components (one dimension vs. another). On this plot, we search for the existence of clusters. It is important to remember that the PCs are sorted (in decreasing order) by variance (i.e., the PC1 is the one with maximum variance, etc).
DimPlot(
object,
reduction = "pca"
)
Another option is to visualize each principal component as a function of gene-specific contributions. This visualization will show us how each gene contributes to variance explanation. One heatmap that shows mainly one single color indicates that all genes contribute in the same direction (this means that it is only an overall average that is being shown). We expect to identify blocks of cell and genes on diverging colors.
DimHeatmap(
object,
dims
)
Similarly to RunPCA, we use the RunUMAP function to obtain the UMAP projections for dimension reduction. There is a tuning parameter dims (number of dimensions to be used) that we need to adjust for. Low values for dims will show strange patterns and increasing dims will allow for the identification of cell clusters.
RunUMAP(
object,
dims
)
We can use the DimPlot function again to visualize the UMAP projections (one dimension vs. another). On this plot, we search for the existence of clusters.
DimPlot(
object,
reduction = "umap" # can be omitted
)
Aspect | PCA | UMAP |
---|---|---|
Type | Linear | Non-linear |
Variance | Maximizes explained variance | Preserves local/global structure |
Interpretation | Easier to interpret PCs | Better for complex structures |
Speed | Faster | Slower |
Clustering is like identifying distinct neighborhoods within the city.
Clustering is like identifying distinct neighborhoods within the city.
FindNeighbors(
object,
dims=1:20
)
Computes the k.param nearest neighbors for a given dataset. Can also optionally (via compute.SNN), construct a shared nearest neighbor graph by calculating the neighborhood overlap (Jaccard index) between every cell and its k.param nearest neighbors.
FindClusters(
object
)
Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm.
Concept | Analogy | Goal |
---|---|---|
scRNA-seq | City | Understand conversations (gene expression) in each building (cell). |
Noise | Background chatter | Makes it hard to hear individual conversations. |
Concept | Analogy | Goal |
---|---|---|
Normalization | Volume control | Adjust volume to make conversations comparable. |
Clustering | Neighborhoods | Group buildings with similar conversations together. |
Normalization and Clustering of scRNA-Seq Data