# install packages
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
if (!requireNamespace("ggbiplot", quietly = TRUE)) {
install.packages("ggbiplot")
}
# load packages
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("ggbiplot")
## Loading required package: ggplot2
Biplot
Example
A biplot is a visualization tool used to present the results of principal component analysis (PCA). It simultaneously displays the positions of both samples and variables in a dataset within the principal component space. In a two-dimensional biplot, samples are typically represented as points, while variables are represented as arrows. The direction and length of the arrows represent the contribution and correlation of the variables to the principal components, respectively.
Setup
System Requirements: ## Cross-platform (Linux/MacOS/Windows)
Programming language: R
Dependent packages:
dplyr
andggbiplot
Data Preparation
The data comes from the iris
dataset that comes with R.
data("iris")
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Visualization
Principal component analysis is performed using the prcomp
function.
<- prcomp (~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
iris.pca data=iris,
scale. = TRUE)
summary(iris.pca)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.7084 0.9560 0.38309 0.14393
Proportion of Variance 0.7296 0.2285 0.03669 0.00518
Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
1. Basic Plot
Use ggbipolot to draw the loading plot of PCA results.
<-
iris.gg ggbiplot(iris.pca, obs.scale = 1, var.scale = 1,
groups = iris$Species, point.size=2,
varname.size = 3,
varname.color = "black",
varname.adjust = 1.2,
ellipse = TRUE,
circle = TRUE) +
labs(fill = "Species", color = "Species") +
theme_minimal(base_size = 14) +
theme(legend.direction = 'horizontal', legend.position = 'top')
iris.gg

Note: The figure title is the gene name, the horizontal axis is PC1, and the vertical axis is PC2. The arrows in the figure indicate the length of the vectors, which represents the contribution to the variance. The direction indicates the correlation with the principal component. The colors represent the different iris species.
We can see that PC1 and PC2 clearly separate the three groups of samples. The vectors Petal.Length and Petal.width are nearly parallel to the x-axis, indicating that the variance in PC1 is primarily contributed by these variables, while the variance in PC2 is primarily contributed by Sepal.Width. An angle between the vectors greater than 90ยฐ indicates a negative correlation, while an angle less than 90ยฐ indicates a positive correlation.
2. Add cluster labels
We can calculate the cluster centers and add labels to more intuitively display the names of different categories.
<-
group.labs $data %>%
iris.gg::summarise(xvar = mean(xvar),
dplyryvar = mean(yvar), .by = groups)
group.labs
groups xvar yvar
1 setosa -2.2099215 -0.2870013
2 versicolor 0.4931384 0.5465027
3 virginica 1.7167831 -0.2595014
+ geom_label(data = group.labs,
iris.gg aes(x = xvar, y=yvar, label=groups),
size = 5) +
theme(legend.position = "none")

3. Use sample names instead of points
In order to show the position of different samples in PCA analysis, we can display them by setting labels.
ggbiplot(iris.pca, obs.scale = 1,
var.scale = 1,
groups = iris$Species,
labels = rownames(iris),
point.size=2,
varname.size = 3,
varname.color = "black",
varname.adjust = 1.2,
ellipse = TRUE,
circle = TRUE) +
labs(fill = "Species", color = "Species") +
theme_minimal(base_size = 14) +
theme(legend.direction = 'horizontal', legend.position = 'top')

Application
1. Metagenomic principal component analysis loading plot

Arrows represent bacterial species, and letters represent samples. The closer the distance between the bacterial species and the sample, the higher the correlation. In the figure, the bacterial species closest to H1, H2, H3, H4, and H7 is Bifidobacteria, indicating that this species is the dominant species in the control group. [1]
2. Transcriptome principal component analysis loading plot

There were significant differences among the four groups, with the contribution values of PC1 and PC2 being 40.92% and 24.26%, respectively. In PC2, the control group (PBS) and the asthma + mutton group (OVA + Lamb) were positively distributed, the asthma + fish group (OVA + fish) and the asthma group (OVA) were mainly negatively distributed. There were significant differences between the control group and the other three groups, and the asthma + mutton group (OVA + Lamb) and the asthma + fish group (OVA + Fish) were significantly separated.[2]
Reference
[1] Balasubramaniam C, Mallappa RH, Singh DK, et al.ย Gut bacterial profile in Indian children of varying nutritional status: a comparative pilot study. Eur J Nutr. 2021;60(7):3971-3985. doi:10.1007/s00394-021-02571-7.
[2] Zheng HC, Wang YA, Liu ZR, et al.ย Consumption of Lamb Meat or Basa Fish Shapes the Gut Microbiota and Aggravates Pulmonary Inflammation in Asthmatic Mice. J Asthma Allergy. 2020;13:509-520. Published 2020 Oct 19. doi:10.2147/JAA.S266584.