Motif Plot

Authors

[Editor] Zilun Jin;

[Contributors] Hu Zheng.

Modified

2026-01-17

For visualizing motif logos, ggseqlogo is an R package based on ggplot2 specifically designed for plotting logos from sequence motifs. Compared to other motif visualization tools, ggseqlogo boasts advantages such as concise syntax, flexible output formats, and full compatibility with the ggplot2 ecosystem. The package supports various sequence input formats, including position-frequency matrices (PFM), position-weight matrices (PWM), and sequence vectors, and provides rich customization options to adjust the appearance of the logo plot.

Example

MotifPlot DEMO1

Motif logo images are graphics used to display conserved patterns in DNA, RNA, or protein sequences, using the size of the characters at each location to indicate the information content of that location.

Setup

  • System Requirements: Cross-platform (Linux/MacOS/Windows)

  • Programming language: R

  • Dependent packages: ggplot2, ggseqlogo, cowplot, gridExtra

# Installing packages
if (!requireNamespace("ggplot2", quietly = TRUE)) {
  install.packages("ggplot2")
}
if (!requireNamespace("ggseqlogo", quietly = TRUE)) {
  install.packages("ggseqlogo")
}
if (!requireNamespace("cowplot", quietly = TRUE)) {
  install.packages("cowplot")
}
if (!requireNamespace("gridExtra", quietly = TRUE)) {
  install.packages("gridExtra")
}

# Load packages
library(ggplot2)
library(ggseqlogo)
library(cowplot)
library(gridExtra)
sessioninfo::session_info("attached")
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       Ubuntu 24.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       UTC
 date     2026-01-17
 pandoc   3.1.3 @ /usr/bin/ (via rmarkdown)
 quarto   1.8.27 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package   * version date (UTC) lib source
 cowplot   * 1.2.0   2025-07-07 [1] RSPM
 ggplot2   * 4.0.1   2025-11-14 [1] RSPM
 ggseqlogo * 0.2.2   2025-12-22 [1] RSPM
 gridExtra * 2.3     2017-09-09 [1] RSPM

 [1] /home/runner/work/_temp/Library
 [2] /opt/R/4.5.2/lib/R/site-library
 [3] /opt/R/4.5.2/lib/R/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────

Data Preparation

Using the built-in dataset ggseqlogo_sample, which contains three variables and two different input formats: - seqs_dna: Binding sites of 12 transcription factors obtained from the JASPAR FASTA file. The format is a list of string vectors, with the list name representing the JASPAR ID. - seqs_aa: Phosphorylation sites of kinase substrates obtained from Wagih et al.Β The format is the same as seqs_dna. - pfms_dna: A list of position-frequency matrices of four transcription factors obtained from JASPAR, with the list name representing the JASPAR ID.

data(ggseqlogo_sample)

head(pfms_dna,n = 1)
$MA0018.2
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
A    0    0   11    0    1    0    2    8
C    1    1    0    9    0    3    7    0
G    1   10    0    2   10    0    1    1
T    9    0    0    0    0    8    1    2
head(seqs_aa, n = 1)[[1]][1:3]
[1] "VVGARRSSWRVVSSI" "GPRSRSRSRDRRRKE" "LLCLRRSSLKAYGNG"

Visualization

1. Basic Motif

The ggseqlogo package can use the geom_logo function to plot data based on ggplot2 syntax, or it can use the encapsulated function ggseqlogo to plot data, and both share the same parameters.

# Using sequence vectors
ggseqlogo(seqs_dna$MA0001.1)
# Using PFM matrix
ggseqlogo(pfms_dna$MA0018.2)
# Plotting using ggplot syntax
ggplot() + geom_logo( seqs_dna$MA0001.1 ) + theme_logo()
FigureΒ 1: Basic Motif
FigureΒ 2: Basic Motif
FigureΒ 3: Basic Motif

The image above shows the motif result of MA0001.1.

Tip

Key parameters:

  • data: Input data: sequence vector, matrix, or list
  • method = β€œbits”: Calculation method: β€œbits” or β€œprobability”
  • seq_type = β€œauto”: Sequence type: β€œauto”, β€œdna”, β€œrna”, β€œaa”
  • namespace = NULL: Custom character namespace
  • font = β€œroboto_medium”: Font type
  • stack_width = 0.95: Character stack width
  • rev_stack_order = FALSE: Whether to reverse the stacking order
  • col_scheme = β€œauto”: Color scheme
  • low_col = β€˜black’: Low-bits/probability color
  • high_col = β€˜yellow’: High-bits/probability color
  • na_col = β€˜grey20’: NA value color
  • plot = TRUE: Whether to plot immediately

2. Multi-motif plot

You can use facet_wrap or facet_grid to combine multiple logo images:

# Draw multiple motifs
ggseqlogo(seqs_dna, ncol=4)

# Equivalent to
p <- ggplot() + geom_logo(seqs_dna) + theme_logo() + 
  facet_wrap(~seq_group, ncol=4, scales='free_x')
FigureΒ 4: Multi-motif plot

3. Motif plot beautify

3.1 Adjust color scheme

ggseqlogo offers a variety of preset and custom color schemes:

# Preset color scheme for DNA sequences
ggseqlogo(seqs_dna$MA0001.1, col_scheme='nucleotide')
FigureΒ 5: Adjust color scheme 1
# Color scheme of amino acid sequences
ggseqlogo(seqs_aa$AKT1, col_scheme='chemistry')
FigureΒ 6: Adjust color scheme 2
# Custom discrete color scheme
cs1 <- make_col_scheme(chars=c('A', 'T', 'C', 'G'), 
                      groups=c('gr1', 'gr1', 'gr2', 'gr2'), 
                      cols=c('purple', 'purple', 'blue', 'blue'))
ggseqlogo(seqs_dna$MA0001.1, col_scheme=cs1)
FigureΒ 7: Adjust color scheme 3
# Custom continuous color scheme
cs2 <- make_col_scheme(chars=c('A', 'T', 'C', 'G'), values=1:4)
ggseqlogo(seqs_dna$MA0001.1, col_scheme=cs2)
FigureΒ 8: Adjust color scheme 4

3.2 Adjust font and stacking

# View all available fonts
list_fonts(F)
 [1] "helvetica_regular"   "helvetica_bold"      "helvetica_light"    
 [4] "roboto_medium"       "roboto_bold"         "roboto_regular"     
 [7] "akrobat_bold"        "akrobat_regular"     "roboto_slab_bold"   
[10] "roboto_slab_regular" "roboto_slab_light"   "xkcd_regular"       
# Use a specific font
ggseqlogo(seqs_dna$MA0001.1, font='helvetica_bold', stack_width=0.8)
FigureΒ 9: Adjust font and stacking

3.3 Adjust the axes and themes

ggseqlogo(seqs_dna$MA0001.1) + 
  theme_classic() +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x='Position', y='Bits', title='Transcription Factor Binding Motif')
FigureΒ 10: Adjust the axes and themes

4. Advanced features

Drawing method selection:

ggseqlogo supports two sequence logo calculation methods:

p1 <- ggseqlogo(seqs_dna$MA0001.1, method='bits')  # Information content
p2 <- ggseqlogo(seqs_dna$MA0001.1, method='prob')  # probability
gridExtra::grid.arrange(p1, p2, ncol=2)
FigureΒ 11: Sequence logo calculation method

Custom sequence types and namespaces:

# Numerical sequence
seqs_numeric <- chartr('ATGC', '1234', seqs_dna$MA0001.1)
ggseqlogo(seqs_numeric, method='prob', namespace=1:4)
FigureΒ 12: Sequence types and namespaces 1
# Greek alphabet sequence
seqs_greek <- chartr('ATGC', 'δΡψλ', seqs_dna$MA0001.1)
ggseqlogo(seqs_greek, namespace='δΡψλ', method='bits')
FigureΒ 13: Sequence types and namespaces 2

Custom height logo:

# Create a custom height matrix
custom_mat <- matrix(rnorm(20), nrow=4, 
                    dimnames=list(c('A', 'T', 'G', 'C')))
ggseqlogo(custom_mat, method='custom', seq_type='dna') + 
  ylab('my custom height')
FigureΒ 14: Custom height logo

Sequence identifier:

ggplot() + 
  annotate('rect', xmin=0.5, xmax=3.5, ymin=-0.05, ymax=1.9, 
           alpha=0.1, col='black', fill='yellow') +
  geom_logo(seqs_dna$MA0001.1, stack_width=0.90) + 
  annotate('segment', x=4, xend=8, y=1.2, yend=1.2, size=2) +  # Note that starting with ggplot2 version 3.4.0, the size parameter for adjusting line thickness has been changed to the linewidth parameter. Users of the new version of ggplot2 are advised to change size to linewidth.
  annotate('text', x=6, y=1.3, label='Text annotation') + 
  theme_logo()
FigureΒ 15: Sequence identifier

Combining multiple plots:

# Generate sequence logo
p1 <- ggseqlogo(seqs_dna$MA0008.1) + 
  theme(axis.text.x=element_blank())

# Create sequence alignment data
aln <- data.frame(
  letter=strsplit('AGATAAGATGATAAAAAGATAAGA', '')[[1]], 
  species=rep(c('a', 'b', 'c'), each=8),
  x=rep(1:8, 3)
)
aln$mut <- 'no'
aln$mut[c(2,15,20,23)] <- 'yes'

# Generate sequence alignment plot
p2 <- ggplot(aln, aes(x, species)) +
  geom_text(aes(label=letter, color=mut, size=mut)) + 
  scale_x_continuous(breaks=1:10, expand=c(0.105, 0)) + 
  xlab('') + 
  scale_color_manual(values=c('black', 'red')) + 
  scale_size_manual(values=c(5, 6)) + 
  theme_logo() + 
  theme(legend.position='none', axis.text.x=element_blank())

# Creating a conservative bar chart
bp_data <- data.frame(x=1:8, conservation=sample(1:100, 8))
p3 <- ggplot(bp_data, aes(x, conservation)) +
  geom_bar(stat='identity', fill='grey') + 
  theme_logo() + 
  scale_x_continuous(breaks=1:10, expand=c(0.105, 0)) + 
  xlab('')

# Composite plots
cowplot::plot_grid(p1, p2, p3, ncol=1, align='v')
FigureΒ 16: Combining multiple plots

Integration with other tools:

ggseqlogo can be used with other bioinformatics packages. For example, the ggmotif package can directly extract motifs from MEME result files and visualize them using ggseqlogo. The universalmotif package also provides integration functionality with ggseqlogo.

Application

Motif maps are widely used in genomics and molecular biology research:

  1. Transcription factor binding site analysis: Displays conserved binding patterns of transcription factors in DNA sequences.

  2. Protein domain analysis: Shows conserved amino acids in functional domains of protein sequences.

  3. Multiple sequence alignment visualization: Displays conserved regions in multiple sequence alignments.

  4. ChIP-seq analysis: Visualizes enriched motifs identified by ChIP-seq experiments.

  5. Genomic feature analysis: Displays sequence features of specific regions of the genome.

Reference

[1] Wagih O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics. 2017;33(22):3645-3647. doi:10.1093/bioinformatics/btx469