Summary Tables with tab

‘tab’ package

The purpose of tab is to make it easier to create tables for papers, including Table 1’s showing characteristics of the sample and summary tables for fitted regression models. Currently, the following functions are included:

tabmeans compares means in two or more groups.
tabmedians compares medians in two or more groups.
tabfreq compares frequencies in two or more groups.
tabmulti compares multiple variables in two or more groups.
tabglm summarizes a generalized linear model (GLM) fit via glm.
tabgee summarizes a generalized estimating equation (GEE) fit via gee.
tabcox summarizes a Cox Proportional Hazards (Cox PH) model fit via coxph in survival.

Creating a Table 1

A toy dataset called tabdata is included in the tab package. It is a data frame with 15 variables and 300 observations. Let’s take a look:

library("tab")
data(tabdata)
dim(tabdata)
#> [1] 300  15
head(tabdata)
#>   ID     Group Age    Sex  Race  BMI time delta death_1yr  bp.1  bp.2
#> 1  1   Control  63 Female Black 28.6 1512     1         0 121.4 123.4
#> 2  2 Treatment  74 Female White 27.2 2987     1         0 145.5 168.9
#> 3  3 Treatment  70   Male Black 25.5 1468     0         0 138.4 134.6
#> 4  4 Treatment  78   Male Other 24.3  691     1         0 128.3 124.2
#> 5  5 Treatment  73   Male White 25.8  477     1         0 163.3 133.2
#> 6  6   Control  61 Female White 21.9 3380     1         0 137.6 130.3
#>    bp.3 highbp.1 highbp.2 highbp.3
#> 1 118.3        0        0        0
#> 2 149.5        1        1        1
#> 3 131.2        0        0        0
#> 4 131.1        0        0        0
#> 5 138.4        1        0        0
#> 6 141.5        0        0        1

Here is how you can use tabmulti to generate a Table 1 comparing characteristics of the treatment and control groups.

(table1 <- tabmulti(data = tabdata, 
                    xvarname = "Group", 
                    yvarnames = c("Age", "Sex", "Race")))
#>       Variable             Control      Treatment    P       
#>  [1,] "Age, M (SD)"        "70.5 (5.3)" "69.5 (5.9)" "0.15"  
#>  [2,] "Sex, n (%)"         ""           ""           "<0.001"
#>  [3,] "  Female"           "93 (68.4)"  "62 (38.5)"  ""      
#>  [4,] "  Male"             "43 (31.6)"  "99 (61.5)"  ""      
#>  [5,] "Race, n (%)"        ""           ""           "0.29"  
#>  [6,] "  White"            "46 (34.1)"  "65 (39.6)"  ""      
#>  [7,] "  Black"            "36 (26.7)"  "52 (31.7)"  ""      
#>  [8,] "  Mexican American" "21 (15.6)"  "19 (11.6)"  ""      
#>  [9,] "  Other"            "32 (23.7)"  "28 (17.1)"  ""

tabmulti created a character matrix, but it doesn’t look like a table yet. If you want to get the table into Word, here are two approaches:

Install/load Kmisc and run write.cb(table1) to copy the table to your clipboard. Paste the result into Word, then highlight the text and go to Insert -> Table -> Convert Text to Table... OK.
Add print.html = TRUE to the above tabmulti function call. That will result in a .html file being written to your working directory. It should appear as a neat table when you open it (e.g. in Google Chrome), and you can copy/paste it into Word.

If you want it to display it in a knitr document, you can add latex = TRUE and then use various approaches…

‘printr’ (‘kable’ for options)

I think the easiest approach is to simply load the printr package. Loading it results in R output printing more neatly, which includes character matrices showing up as neat tables.

library("printr")
(table1 <- tabmulti(data = tabdata, 
                    xvarname = "Group", 
                    yvarnames = c("Age", "Sex", "Race"), 
                    latex = TRUE))

Variable	Control	Treatment	P
Age, M (SD)	70.5 (5.3)	69.5 (5.9)	0.15
Sex, n (%)			<0.001
\(\hskip .4cm\)Female	93 (68.4)	62 (38.5)
\(\hskip .4cm\)Male	43 (31.6)	99 (61.5)
Race, n (%)			0.29
\(\hskip .4cm\)White	46 (34.1)	65 (39.6)
\(\hskip .4cm\)Black	36 (26.7)	52 (31.7)
\(\hskip .4cm\)Mexican American	21 (15.6)	19 (11.6)
\(\hskip .4cm\)Other	32 (23.7)	28 (17.1)

detach("package:printr", unload = TRUE)

(I detached printr so R reverts to its usual output display format for the rest of the vignette.)

If you want to add table options, e.g. a caption or non-default column alignment, you can use kable from the knitr package (e.g. try kable(table1, align = "lrrr", caption = "Table 1.").

‘kable’

knitr’s kable function works nicely:

library("knitr")
kable(table1,
      caption = "Table 1a. Characteristics (created by `tabmulti`/`kable`).", 
      align = 'lrrr')

Table 1a. Characteristics (created by `tabmulti`/`kable`).
Variable	Control	Treatment	P
Age, M (SD)	70.5 (5.3)	69.5 (5.9)	0.15
Sex, n (%)			<0.001
\(\hskip .4cm\)Female	93 (68.4)	62 (38.5)
\(\hskip .4cm\)Male	43 (31.6)	99 (61.5)
Race, n (%)			0.29
\(\hskip .4cm\)White	46 (34.1)	65 (39.6)
\(\hskip .4cm\)Black	36 (26.7)	52 (31.7)
\(\hskip .4cm\)Mexican American	21 (15.6)	19 (11.6)
\(\hskip .4cm\)Other	32 (23.7)	28 (17.1)

‘xtable’

Another option is the xtable package/function (requires adding results = "asis" as a chunk option!):

library("xtable")
print(xtable(table1, 
             caption = "Table 1b. Characteristics (created by `tabmulti`/`xtable`).", 
             align = 'llrrr',), 
      type = "html", 
      include.rownames = FALSE)

Table 1b. Characteristics (created by `tabmulti`/`xtable`).
Variable	Control	Treatment	P
Age, M (SD)	70.5 (5.3)	69.5 (5.9)	0.15
Sex, n (%)			<0.001
\(\hskip .4cm\)Female	93 (68.4)	62 (38.5)
\(\hskip .4cm\)Male	43 (31.6)	99 (61.5)
Race, n (%)			0.29
\(\hskip .4cm\)White	46 (34.1)	65 (39.6)
\(\hskip .4cm\)Black	36 (26.7)	52 (31.7)
\(\hskip .4cm\)Mexican American	21 (15.6)	19 (11.6)
\(\hskip .4cm\)Other	32 (23.7)	28 (17.1)

‘pandoc.table’

And finally the pandoc.table function in pander (also requires results = "asis"):

library("pander")
pandoc.table(table1, 
             caption = "Table 1c. Characteristics (created by `tabmulti`/`pandoc.table`).", 
             style = "rmarkdown", 
             justify = 'lrrr', 
             split.tables = Inf)

Table 1c. Characteristics (created by `tabmulti`/`pandoc.table`).
Variable	Control	Treatment	P
Age, M (SD)	70.5 (5.3)	69.5 (5.9)	0.15
Sex, n (%)			<0.001
\(\hskip .4cm\)Female	93 (68.4)	62 (38.5)
\(\hskip .4cm\)Male	43 (31.6)	99 (61.5)
Race, n (%)			0.29
\(\hskip .4cm\)White	46 (34.1)	65 (39.6)
\(\hskip .4cm\)Black	36 (26.7)	52 (31.7)
\(\hskip .4cm\)Mexican American	21 (15.6)	19 (11.6)
\(\hskip .4cm\)Other	32 (23.7)	28 (17.1)

More on ‘tabmulti’

Recall the tabmulti function call from above:

table1 <- tabmulti(data = tabdata, 
                   xvarname = "Group", 
                   yvarnames = c("Age", "Sex", "Race"), 
                   latex = TRUE)

I specified the data frame, the name of the group variable, and the names of the variables I wanted to compare. By default, tabmulti treats each Y variable as continuous if it is numeric and takes on 5 or more unique values, and categorical otherwise. It compares means for continuous variables and frequencies for categorical variables.

Internally, tabmulti called tabmeans for the first comparison and tabfreq for the second and third. We could have created the same table using these functions and rbind:

table1b <- rbind(tabmeans(x = tabdata$Group, y = tabdata$Age, latex = TRUE), 
                 tabfreq(x = tabdata$Group, y = tabdata$Sex, latex = TRUE), 
                 tabfreq(x = tabdata$Group, y = tabdata$Race, latex = TRUE))
all(table1 == table1b)
#> [1] TRUE

Let’s go through some more options. The columns input controls what columns are shown, with the default columns = c("xgroups", "p") requesting a column for each x level and the p-value (from t-test or ANOVA). Since we have missing values and tabmulti uses pairwise deletion by default, let’s add a sample size column, and why not also throw in a column for the overall sample statistics.

table1 <- tabmulti(data = tabdata, 
                   xvarname = "Group", 
                   yvarnames = c("Age", "Sex", "Race"),
                   columns = c("n", "overall", "xgroups", "p"),
                   latex = TRUE)
kable(table1, 
      caption = "Table 1d. Characteristics of sample.", 
      align = 'lrrrrr')

Table 1d. Characteristics of sample.
Variable	N	Overall	Control	Treatment	P
Age, M (SD)	296	69.9 (5.7)	70.5 (5.3)	69.5 (5.9)	0.15
Sex, n (%)	297				<0.001
\(\hskip .4cm\)Female		155 (52.2)	93 (68.4)	62 (38.5)
\(\hskip .4cm\)Male		142 (47.8)	43 (31.6)	99 (61.5)
Race, n (%)	299				0.29
\(\hskip .4cm\)White		111 (37.1)	46 (34.1)	65 (39.6)
\(\hskip .4cm\)Black		88 (29.4)	36 (26.7)	52 (31.7)
\(\hskip .4cm\)Mexican American		40 (13.4)	21 (15.6)	19 (11.6)
\(\hskip .4cm\)Other		60 (20.1)	32 (23.7)	28 (17.1)

For age, often the range is more informative than the SD. We can display M (min-max) rather than M (SD) but setting the tabmeans input parenth = "sd". To pass this argument through tabmulti, we use the means.list argument:

table1 <- tabmulti(data = tabdata, 
                   xvarname = "Group", 
                   yvarnames = c("Age", "Sex", "Race"),
                   columns = c("n", "overall", "xgroups", "p"),
                   means.list = list(parenth = "minmax"),
                   latex = TRUE)
kable(table1, 
      caption = "Table 1e. Characteristics of sample.", 
      align = 'lrrrrr')

Table 1e. Characteristics of sample.
Variable	N	Overall	Control	Treatment	P
Age, M (min, max)	296	69.9 (60, 80)	70.5 (60, 79)	69.5 (60, 80)	0.15
Sex, n (%)	297				<0.001
\(\hskip .4cm\)Female		155 (52.2)	93 (68.4)	62 (38.5)
\(\hskip .4cm\)Male		142 (47.8)	43 (31.6)	99 (61.5)
Race, n (%)	299				0.29
\(\hskip .4cm\)White		111 (37.1)	46 (34.1)	65 (39.6)
\(\hskip .4cm\)Black		88 (29.4)	36 (26.7)	52 (31.7)
\(\hskip .4cm\)Mexican American		40 (13.4)	21 (15.6)	19 (11.6)
\(\hskip .4cm\)Other		60 (20.1)	32 (23.7)	28 (17.1)

Technically the range is the difference between the min and the max, not the min and the max, but if you prefer the label "M (range)", you could specify the text.label input: means.list = list(parenth = "minmax", text.label = "M (range)").

These are some of the options you have access to with tabmulti and the underlying functions it calls. A complete list of options are described in the help files for tabmulti, tabmeans, tabmedians, and tabfreq. The help files also have some different examples.

Regression summaries

Linear regression

Suppose we want to summarize a linear regression of BMI on age, sex, race, and treatment group. You could use kable, xtable, or pandoc.table to print a summary table like this:

fit <- glm(BMI ~ Age + Sex + Race + Group, data = tabdata)
print(xtable(fit, 
             caption = "Table 2a. Linear regression fit (created by `xtable`)."), 
      type = "html")

Table 2a. Linear regression fit (created by `xtable`).
	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	22.1009	1.6340	13.53	0.0000
Age	0.0102	0.0231	0.44	0.6591
SexMale	-0.1800	0.2723	-0.66	0.5090
RaceBlack	0.3362	0.3183	1.06	0.2918
RaceMexican American	0.6283	0.4204	1.49	0.1361
RaceOther	0.0204	0.3613	0.06	0.9550
GroupTreatment	3.0966	0.2755	11.24	0.0000

But this isn’t how a regression table in a paper typically looks. A few issues:

P-value column should have a simpler heading.
P-values should not print as 0.
For factor variables, levels should be printed more cleanly.

Let’s try tabglm:

table2 <- tabglm(fit = fit, 
                 latex = TRUE)
kable(table2, 
      caption = "Table 2b. Linear regression fit (created by `tabglm` and `kable`).", 
      align = 'lrr')

Table 2b. Linear regression fit (created by `tabglm` and `kable`).
Variable	Beta (SE)	P
Intercept	22.10 (1.63)	<0.001
Age	0.01 (0.02)	0.66
Sex
\(\hskip .4cm\)Female (ref)	–	–
\(\hskip .4cm\)Male	-0.18 (0.27)	0.51
Race
\(\hskip .4cm\)White (ref)	–	–
\(\hskip .4cm\)Black	0.34 (0.32)	0.29
\(\hskip .4cm\)Mexican American	0.63 (0.42)	0.14
\(\hskip .4cm\)Other	0.02 (0.36)	0.95
Group
\(\hskip .4cm\)Control (ref)	–	–
\(\hskip .4cm\)Treatment	3.10 (0.28)	<0.001

If you don’t like all the white space, you can set compress.factors = TRUE to omit rows with factor variable names (and left-align factor levels). By default, the input omit.refgroups has the same value as compress.factors, and omit.refgroups = TRUE omits reference group rows.

table2 <- tabglm(fit = fit, 
                 compress.factors = TRUE,
                 latex = TRUE)
kable(table2, 
      caption = "Table 2c. Linear regression fit (created by `tabglm` and `kable`).", 
      align = 'lrr')

Table 2c. Linear regression fit (created by `tabglm` and `kable`).
Variable	Beta (SE)	P
Intercept	22.10 (1.63)	<0.001
Age	0.01 (0.02)	0.66
Male	-0.18 (0.27)	0.51
Black	0.34 (0.32)	0.29
Mexican American	0.63 (0.42)	0.14
Other	0.02 (0.36)	0.95
Treatment	3.10 (0.28)	<0.001

Maybe you’re submitting to one of those enlightened journals that thinks comparing confidence intervals to 0 is totally different than comparing p-values to 0.05. tabglm can do confidence intervals.

table2 <- tabglm(fit = fit, 
                 columns = c("beta.se", "betaci"),
                 compress.factors = TRUE,
                 latex = TRUE)
#> Waiting for profiling to be done...
kable(table2, 
      caption = "Table 2d. Linear regression fit (created by `tabglm` and `kable`).", 
      align = 'lrr')

Table 2d. Linear regression fit (created by `tabglm` and `kable`).
Variable	Beta (SE)	95% CI for Beta
Intercept	22.10 (1.63)	18.90, 25.30
Age	0.01 (0.02)	-0.04, 0.06
Male	-0.18 (0.27)	-0.71, 0.35
Black	0.34 (0.32)	-0.29, 0.96
Mexican American	0.63 (0.42)	-0.20, 1.45
Other	0.02 (0.36)	-0.69, 0.73
Treatment	3.10 (0.28)	2.56, 3.64

Logistic regression

Summarizing a fitted logistic regression model with tabglm is very similar. For 1-year mortality vs. age, age squared, sex, race, and treatment group:

fit <- glm(death_1yr ~ poly(Age, 2, raw = TRUE) + Sex + Race + Group, 
           data = tabdata, family = "binomial")
table3 <- tabglm(fit = fit, 
                 compress.factors = "binary", 
                 latex = TRUE)
#> Waiting for profiling to be done...
kable(table3, 
      caption = "Table 3. Logistic regression fit (created by `tabglm` and `kable`).", 
      align = 'lrrr')

Table 3. Logistic regression fit (created by `tabglm` and `kable`).
Variable	Beta (SE)	OR (95% CI)	P
Intercept	-20.28 (25.57)	–	0.43
Age	0.54 (0.73)	1.72 (0.42, 7.43)	0.46
Age squared	-0.00 (0.01)	1.00 (0.99, 1.01)	0.48
Male	0.14 (0.29)	1.15 (0.64, 2.05)	0.64
Race		–
\(\hskip .4cm\)White (ref)	–	–	–
\(\hskip .4cm\)Black	-0.92 (0.38)	0.40 (0.19, 0.83)	0.02
\(\hskip .4cm\)Mexican American	0.16 (0.42)	1.17 (0.50, 2.68)	0.70
\(\hskip .4cm\)Other	0.04 (0.37)	1.04 (0.50, 2.16)	0.91
Treatment	0.04 (0.30)	1.04 (0.58, 1.89)	0.89

Notice that the second-order term was labeled appropriately, and tabglm recognized fit as a logistic regression and thus by default added a OR (95% CI) column. Additionally, the binary Sex and Group variables was displayed as single rows, while the other factor variable, Race, was shown in a more expanded format. This was the result of setting compress.factors = "binary" (compress.factors can be TRUE, FALSE, or "binary").

GEEs

To summarize a fitted GEE, we can convert tabdata from wide to long format, fit a GEE, and then call tabgee. Here’s a table for blood pressure vs. age, sex, race, BMI, and treatment group, with columns for Beta, SE, Z, and P:

tabdata2 <- reshape(data = tabdata,
                    varying = c("bp.1", "bp.2", "bp.3", "highbp.1",
                                "highbp.2", "highbp.3"),
                    timevar = "bp.visit", 
                    direction = "long")
tabdata2 <- tabdata2[order(tabdata2$id), ]
fit <- gee(bp ~ Age + Sex + Race + BMI + Group, 
           id = id, 
           data = tabdata2,
           corstr = "unstructured")
#>          (Intercept)                  Age              SexMale 
#>         111.89775246           0.04056654           4.16389689 
#>            RaceBlack RaceMexican American            RaceOther 
#>           0.15475299           1.16832912           0.01975425 
#>                  BMI       GroupTreatment 
#>           0.63487889           3.65538664
table4 <- tabgee(fit = fit, 
                 columns = c("beta", "se", "z", "p"),
                 compress.factors = "binary", 
                 data = tabdata2, 
                 latex = TRUE)
kable(table4, 
      caption = "Table 4. Cox PH fit (created by `tabgee` and `kable`).", 
      align = 'lrrr')

Table 4. Cox PH fit (created by `tabgee` and `kable`).
Variable	Beta	SE	Z	P
Intercept	111.71	5.45	20.50	<0.001
Age	0.04	0.07	0.65	0.51
Male	4.16	0.75	5.54	<0.001
Race		-	-
\(\hskip .4cm\)White (ref)	-	-	-	-
\(\hskip .4cm\)Black	0.19	0.87	0.22	0.83
\(\hskip .4cm\)Mexican American	1.13	1.27	0.89	0.37
\(\hskip .4cm\)Other	0.03	1.09	0.03	0.98
BMI	0.64	0.16	3.87	<0.001
Treatment	3.65	0.98	3.71	<0.001

Cox PH

And finally, to summarize a fitted Cox PH model for survival vs. covariates, with default settings:

tabdata <- tabdata[complete.cases(tabdata), ]
fit <- coxph(Surv(time = tabdata$time, event = tabdata$delta) ~ 
               Age + Sex + Race + Group, 
             data = tabdata)
table5 <- tabcox(fit = fit, 
                 latex = TRUE)
kable(table5, caption = "Table 5. Cox PH fit (created by `tabcox` and `kable`).", align = 'lrrr')

Table 5. Cox PH fit (created by `tabcox` and `kable`).
Variable	Beta (SE)	HR (95% CI)	P
Age	0.05 (0.01)	1.05 (1.02, 1.08)	0.001
Sex		-
\(\hskip .4cm\)Female (ref)	-	-	-
\(\hskip .4cm\)Male	0.11 (0.18)	1.12 (0.79, 1.58)	0.54
Race		-
\(\hskip .4cm\)White (ref)	-	-	-
\(\hskip .4cm\)Black	-0.98 (0.22)	0.38 (0.24, 0.58)	<0.001
\(\hskip .4cm\)Mexican American	0.07 (0.26)	1.07 (0.64, 1.78)	0.80
\(\hskip .4cm\)Other	-0.08 (0.22)	0.92 (0.59, 1.43)	0.71
Group		-
\(\hskip .4cm\)Control (ref)	-	-	-
\(\hskip .4cm\)Treatment	0.13 (0.18)	1.14 (0.80, 1.61)	0.48

Closing comments

I suggest printr with kable for alignment, captions, etc.
Working on functions for complex survey data.
Feel free to collaborate on GitHub!

References

Dahl, David B. 2016. Xtable: Export Tables to Latex or Html. https://CRAN.R-project.org/package=xtable.

Daróczi, Gergely, and Roman Tsegelskyi. 2017. Pander: An R ’Pandoc’ Writer. http://rapporter.github.io/pander.

Terry M. Therneau, and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer.

Therneau, Terry M. 2015. A Package for Survival Analysis in S. https://CRAN.R-project.org/package=survival.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.

———. 2017. Printr: Automatically Print R Objects to Appropriate Formats According to the ’Knitr’ Output Format. https://CRAN.R-project.org/package=printr.

———. 2018. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.

Summary Tables with ‘tab’

Dane Van Domelen

2018-02-19