Regression Analysis Table

Authors

The regression analysis table is used to display the results of the regression model. It provides statistical information about the variables in the model and helps explain the relationship between the variables.

Example

The figure below shows a regression analysis based on the pbc dataset built into the survival package. The figure uses the Cox proportional hazards model and generalized linear regression model to explore the effects of serum protein, sex, and age on survival.

Setup

System Requirements: Cross-platform (Linux/MacOS/Windows)
Programming language: R
Dependent packages: survival, gtsummary, dplyr, datawizard

# Installing packages
if (!requireNamespace("survival", quietly = TRUE)) {
  install.packages("survival")
}
if (!requireNamespace("gtsummary", quietly = TRUE)) {
  install.packages("gtsummary")
}
if (!requireNamespace("dplyr", quietly = TRUE)) {
  install.packages("dplyr")
}
if (!requireNamespace("datawizard", quietly = TRUE)) {
  install.packages("datawizard")
}
if (!requireNamespace("broom.helpers", quietly = TRUE)) {
  remotes::install_github("larmarange/broom.helpers")
}

# Load packages
library(survival)
library(gtsummary)
library(dplyr)
library(datawizard)
library(broom.helpers)

Data Preparation

The pbc dataset, built into the survival R package, contains information on 418 patients with primary biliary cirrhosis (PBC) who had received ursodeoxycholic acid (UDCA) treatment at or before the start of the study. The pbc dataset records clinical information such as survival time, survival status, treatment method, age, sex, albumin level, and hepatomegaly.

df <- pbc %>%
  filter(status != 1) %>%
  mutate(status = ifelse(status == 2, 1, 0)) %>%
  select(2:13) %>%
  na.omit() %>% 
  # Divide `albumin` into 3 groups
  mutate(albumin3cat = categorize(albumin, split = "quantile", n_groups = 3))

head(df[,1:6])

  time status trt      age sex ascites
1  400      1   1 58.76523   f       1
2 4500      0   1 56.44627   f       0
3 1012      1   1 70.07255   m       0
4 1925      1   1 54.74059   f       0
5 2503      1   2 66.25873   f       0
6 1832      0   2 55.53457   f       0

Visualization

1. Basic regression analysis table

# Basic regression analysis table
t1 <- coxph(Surv(time, status) ~ albumin + sex + age,
  data = df
) %>%
  tbl_regression()

t1

Characteristic	log(HR)	95% CI	p-value
albumin	-1.7	-2.1, -1.2	<0.001
sex
m	—	—
f	-0.56	-1.1, -0.06	0.027
age	0.03	0.01, 0.05	0.004
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 1: Basic regression analysis table

This table is a basic regression analysis table. The Cox proportional hazard model is called by coxph() and the results of the regression analysis are tabulated using tbl_regression().

2. tbl_regression() parameter settings

# tbl_regression() parameter settings
t1 <- coxph(Surv(time, status) ~ albumin + sex + age,
  data = df
) %>%
  # parameter settings
  tbl_regression(
    conf.level = 0.90,
    exponentiate = TRUE,
    include = c("sex", "albumin"),
    show_single_row = sex,
    label = list(sex ~ "sex as categorical")
  )

t1

Characteristic	HR	90% CI	p-value
sex as categorical	0.57	0.37, 0.87	0.027
albumin	0.19	0.13, 0.28	<0.001
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 2: tbl_regression() parameter settings

There are many parameter settings in the table tbl_regression(), which can change the confidence interval, row names and other information of the table.

Tip

Important parameter: tbl_regression

conf.level: Determines the confidence level for the regression analysis. The default value of 0.95 indicates a 95% confidence interval.
exponentiate: Whether to exponentiate the HR values. The default column is log(HR) values.
include: Which independent variables (rows) are included in the statistical table.
show_single_row: Applies to binary variables and does not display the control group.
label: Changes the name (row name) of the independent variable.

3. Add global-p

# Add global-p
df$albumin3cat=as.factor(df$albumin3cat)

t1 <- coxph(Surv(time, status) ~ albumin3cat + sex + age,
            data = df) %>%
  tbl_regression()  %>%
  add_global_p()

t1

Characteristic	log(HR)	95% CI	p-value
albumin3cat			<0.001
1	—	—
2	-0.72	-1.2, -0.27
3	-1.3	-1.7, -0.77
sex			0.073
m	—	—
f	-0.48	-0.98, 0.02
age	0.03	0.01, 0.05	<0.001
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 3: Add global-p

This table adds global P values via add_global_p().

4. Add q-value

# Add q-value
t1 <- coxph(Surv(time, status) ~ albumin + sex + age,
  data = df
) %>%
  tbl_regression() %>%
  # Add a q-value column
  add_q()

t1

Characteristic	log(HR)	95% CI	p-value	q-value¹
albumin	-1.7	-2.1, -1.2	<0.001	<0.001
sex
m	—	—
f	-0.56	-1.1, -0.06	0.027	0.027
age	0.03	0.01, 0.05	0.004	0.006
¹ False discovery rate correction for multiple testing
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 4: Add q-value

The table adds a q-value column via add_q().

5. Merge columns

# Merge columns
t1 <- coxph(Surv(time, status) ~ albumin3cat + sex + age,
            data = df) %>%
  tbl_regression()  %>%
  # Modify the header name
  modify_header(p.value = "**P**",estimate="**log(HR) (CI)**") %>%
  # Combine the score column and the CI column and apply to rows where the score is not NA
  modify_column_merge(
    pattern = "{estimate} ({conf.low}, {conf.high})",
    rows = !is.na(estimate)
  )

t1

Characteristic	log(HR) (CI)	P
albumin3cat
1	—
2	-0.72 (-1.2, -0.27)	0.002
3	-1.3 (-1.7, -0.77)	<0.001
sex
m	—
f	-0.48 (-0.98, 0.02)	0.062
age	0.03 (0.01, 0.05)	<0.001
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 5: Merge columns

This table merges the score column and the CI column, which is a common form. The table header is appropriately modified after the merger.

Tip

Important Functions modify_header / modify_column_merge:

modify_header:modify_header() can be used to modify the header name. label represents the variable column, p.value represents the P-value column, ci represents the confidence interval column, estimate represents the score column, conf.low represents the lower confidence interval, and conf.high represents the upper confidence interval.

modify_column_merge: The pattern parameter specifies the pattern of columns to be merged, and the rows parameter specifies the rows to apply the merge to.

6. Model integration

Model horizontal integration合

# Model horizontal integration
t1 <- coxph(Surv(time, status) ~ albumin + sex + age,
  data = df
) %>%
  tbl_regression()
t2 <- glm(time ~ albumin + sex + age,
  data = df
) %>%
  tbl_regression()
# t1, t2 model integration
t3 <- tbl_merge(
  tbls = list(t1, t2),
  tab_spanner = c("**Cox Model**", "**GLM Model**")
)

t3

Characteristic	Cox Model			GLM Model
Characteristic	log(HR)	95% CI	p-value	Beta	95% CI	p-value
albumin	-1.7	-2.1, -1.2	<0.001	1,128	813, 1,442	<0.001
sex
m	—	—		—	—
f	-0.56	-1.1, -0.06	0.027	34	-360, 428	0.9
age	0.03	0.01, 0.05	0.004	-7.7	-20, 4.9	0.2
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 6: Model horizontal integration

This table uses tbl_merge to horizontally integrate the results of the Cox proportional hazards model and the generalized linear regression model for patients with pdb.

Model vertical integration

# Model vertical integration
t1 <- coxph(Surv(time, status) ~ albumin + sex + age,
  data = df[df$hepato==0,]
) %>%
  tbl_regression()

t2 <- coxph(Surv(time, status) ~ albumin + sex + age,
  data = df[df$hepato==1,]
) %>%
  tbl_regression()

tbl_stack(list(t1, t2), group_header = c("Patients without hepato", "Patients with hepato"))

Characteristic	log(HR)	95% CI	p-value
Patients without hepato
albumin	-1.4	-2.3, -0.47	0.003
sex
m	—	—
f	-1.1	-1.9, -0.36	0.004
age	0.06	0.03, 0.09	<0.001
Patients with hepato
albumin	-1.4	-2.0, -0.87	<0.001
sex
m	—	—
f	-0.29	-0.97, 0.38	0.4
age	0.01	-0.01, 0.04	0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio

Figure 7: Model vertical integration

This table uses tbl_stack to longitudinally integrate the results of the Cox proportional hazards model for the two groups of pdb patients with and without hepato.

Application

RegressionTableApp1 — Figure 8: Regression Analysis Table

Univariate and multivariate Cox regression analyses were performed to assess 3-year tumor recurrence in patients with intermediate-risk NMIBC. [1]

RegressionTableApp2 — Figure 9: Regression Analysis Table

This regression analysis table shows the independent predictors of late total mortality in multivariable Cox regression analysis. [2]

Reference

[1] CHEN J X, HUANG W T, ZHANG Q Y, et al. The optimal intravesical maintenance chemotherapy scheme for the intermediate-risk group non-muscle-invasive bladder cancer[J]. BMC Cancer, 2023,23(1): 1018.

[2] BENKE K, ÁGG B, SZABÓ L, et al. Bentall procedure: quarter century of clinical experiences of a single surgeon[J]. J Cardiothorac Surg, 2016,11: 19.