Corrplot Big Data

Authors

[Editor] Hu Zheng;

[Contributors]

The correlation heat map is a graph that analyzes the correlation between two or more variables.

Setup

  • System Requirements: Cross-platform (Linux/MacOS/Windows)

  • Programming language: R

  • Dependent packages: data.table; jsonlite; ComplexHeatmap

# Install packages
if (!requireNamespace("data.table", quietly = TRUE)) {
  install.packages("data.table")
}
if (!requireNamespace("jsonlite", quietly = TRUE)) {
  install.packages("jsonlite")
}
if (!requireNamespace("ComplexHeatmap", quietly = TRUE)) {
  install.packages("ComplexHeatmap")
}

# Load packages
library(data.table)
library(jsonlite)
library(ComplexHeatmap)

Data Preparation

The loaded data are the gene names and the expression of each sample.

# Load data
data <- data.table::fread(jsonlite::read_json("https://hiplot.cn/ui/basic/big-corrplot/data.json")$exampleData$textarea[[1]])
data <- as.data.frame(data)

# convert data structure
data <- data[!is.na(data[, 1]), ]
idx <- duplicated(data[, 1])
data[idx, 1] <- paste0(data[idx, 1], "--dup-", cumsum(idx)[idx])
rownames(data) <- data[, 1]
data <- data[, -1]
str2num_df <- function(x) {
  x[] <- lapply(x, function(l) as.numeric(l))
  x
}
tmp <- t(str2num_df(data))
corr <- round(cor(tmp, use = "na.or.complete", method = "pearson"), 3)

# View data
head(corr[,1:5])
         RGL4   MPP7   UGCG CYSTM1  ANXA2
RGL4    1.000  0.914  0.929  0.936 -0.592
MPP7    0.914  1.000  0.852  0.907 -0.543
UGCG    0.929  0.852  1.000  0.956 -0.440
CYSTM1  0.936  0.907  0.956  1.000 -0.358
ANXA2  -0.592 -0.543 -0.440 -0.358  1.000
ENDOD1 -0.908 -0.862 -0.791 -0.762  0.826

Visualization

# Corrplot Big Data
p <- ComplexHeatmap::Heatmap(
  corr, col = colorRampPalette(c("#4477AA","#FFFFFF","#BB4444"))(50),
  clustering_distance_rows = "euclidean",
  clustering_method_rows = "ward.D2",
  clustering_distance_columns = "euclidean",
  clustering_method_columns = "ward.D2",
  show_column_dend = FALSE, show_row_dend = FALSE,
  column_names_gp = gpar(fontsize = 8),
  row_names_gp = gpar(fontsize = 8)
)

p
FigureΒ 1: Corrplot Big Data

Red indicates positive correlation between two genes, blue indicates negative correlation between two genes, and the number in each cell indicates correlation coefficient.