# Install packages
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
if (!requireNamespace("ggplot2", quietly = TRUE)) {
install.packages("ggplot2")
}
if (!requireNamespace("gapminder", quietly = TRUE)) {
install.packages("gapminder")
}
if (!requireNamespace("hrbrthemes", quietly = TRUE)) {
install.packages("hrbrthemes")
}
if (!requireNamespace("viridis", quietly = TRUE)) {
install.packages("viridis")
}
# Load packages
library(dplyr)
library(ggplot2)
library(gapminder) # is an R package that provides a summary of data from Gapminder.org. The data includes life expectancy, GDP per capita, and population data for 142 countries every five years from 1952 to 2007.
library(hrbrthemes)
library(viridis)Bubble Plot
A bubble plot is a scatter plot in which a third numeric variable is mapped to the size of the circles. This article shows several ways to build bubble charts using R.
Example

Setup
System Requirements: Cross-platform (Linux/MacOS/Windows)
Programming Language: R
Dependencies:
dplyr,ggplot2,gapminder,hrbrthemes,viridis
Data Preparation
Use R built-in iris dataset, TCGA database and gapminder built-in dataset.
# R built-in data - iris
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
# TCGA database (using clinical data on lung cancer in 2020)
TCGA_clinic <- readr::read_tsv("https://bizard-1301043367.cos.ap-guangzhou.myqcloud.com/raponi2006_public_raponi2006_public_clinicalMatrix.gz")
TCGA_clinic$T <- as.factor(TCGA_clinic$T)
# gapminder package
data <- gapminder %>%
filter(year=="2007") %>%
dplyr::select(-year)Visualization
1. Basic Plot
# Taking iris data as an example
p <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, size = Species)) +
geom_point(alpha=0.4)
p
This bubble plot depicts the relationship between the Sepal.Length and Sepal.Width variables across species.
2. Change Color
# Change Color
p <- ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, size=Species,color=Species)) +
geom_point(alpha=0.3)
p
This bubble plot depicts the relationship between the Sepal.Length and Sepal.Width variables across species.
3. Change the range of bubble size
# Taking TCGA data as an example
p <- TCGA_clinic %>%
arrange(desc(Age)) %>%
ggplot(aes(x=T, y=OS.time, size = Age,color=Age)) +
geom_point(alpha=0.4) +
scale_size(range = c(.1, 12), name="Age")
p
This bubble plot depicts the relationship between age and OS.time variable in different tumor T stages.
4. Refined Plot
# Take the data in the gapminder package as an example
p <- data %>%
arrange(desc(pop)) %>%
mutate(country = factor(country, country)) %>%
ggplot(aes(x=gdpPercap, y=lifeExp, size=pop, fill=continent)) +
geom_point(alpha=0.5, shape=21, color="black") +
scale_size(range = c(.1, 24), name="Population (M)") +
scale_fill_viridis(discrete=TRUE, guide=FALSE, option="A") +
theme_ipsum() +
theme(legend.position="bottom") +
ylab("Life Expectancy") +
xlab("Gdp per Capita") +
theme(legend.position = "none")
p
This bubble plot depicts the relationship between life expectancy (y) and GDP per capita (x) for countries around the world.
Applications
1. Bubble plots are used to display KEGG pathway enrichment analysis
This bubble plot shows 26 signaling pathways associated with the development and progression of COVID-19. [1]
2. Bubble plot can be used to display multivariate data
This bubble plot demonstrates the agreement between extreme temperature anomalies estimated from mooring observations and historical temperature profile measurements. a. Bubble plot of the 95th percentile EHTA estimated from a mooring site and the corresponding historical temperature distribution measurements within a 5ร5ยฐ grid box centered at the mooring site. Colors indicate observations from different locations, and dot size indicates observation depths from 10 to 1000 meters. b. Same as a, but for the 5th percentile ELTA. Note that ELTAs below -3ยฐC (at the KEO site) are not plotted. c, d.ย Same as a(c) and b(d), but with different percentages (p) of EHTA measurements observed within AE (c) and ELTA measurements observed within CE (d). The solid black line with gray shading represents the corresponding linear regression with a 95% confidence level. [2]
Reference
[1] Oh KK, Adnan M, Cho DH. Network pharmacology approach to decipher signaling pathways associated with target proteins of NSAIDs against COVID-19. Sci Rep.ย 2021 May 5;11(1):9606. doi: 10.1038/s41598-021-88313-5. PMID: 33953223; PMCID: PMC8100301.
[2] He Q, Zhan W, Feng M, Gong Y, Cai S, Zhan H. Common occurrences of subsurface heatwaves and cold spells in ocean eddies. Nature. 2024 Oct;634(8036):1111-1117. doi: 10.1038/s41586-024-08051-2. Epub 2024 Oct 16. PMID: 39415017; PMCID: PMC11525169.
[3] Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org
[4] Bryan, J. (2023). gapminder: Data from Gapminder (Version 1.0.0). Retrieved from https://CRAN.R-project.org/package=gapminder
[5] Wickham, H. (2017). dplyr: A Grammar of Data Manipulation (Version 0.7.4). Retrieved from https://CRAN.R-project.org/package=dplyr
[6] Rudis, B. (2017). hrbrthemes: Additional Themes, Theme Components and Utilities for โggplot2โ (Version 0.8.7). Retrieved from https://github.com/hrbrmstr/hrbrthemes
[7] Ross, N., & Garnier, S. (2016). viridis: Default Color Maps for โggplot2โ. Retrieved from https://cran.r-project.org/web/packages/viridis/index.html
