I don’t mind for loops in R—they’re easy to write, easy to read, and can be very fast (e.g. if the output vector or matrix is pre-allocated). But they aren’t very “elegant” (no pipes?!?!) and they require a lot of keystrokes. So I recently decided to look for a go-to alternative.
The apply family is great for replacing for loops, but you often end up with a bunch of lapply’s, sapply’s, rbind’s, and so on, which gets ugly pretty fast.
The pmap function in purrr (Henry and Wickham 2017) is almost exactly what I was looking for. You specify a function and a “list of lists” of parameter values to loop over, and it does it. But I wanted slightly different functionality, namely:
So I wrote a function called iterate and added it to my dvmisc package (Van Domelen 2018). You can install dvmisc from GitHub by loading devtools and running install_github("vandomed/dvmisc")
.
Suppose you have a function called f.reg that fits a (possibly log-transformed) linear regression relating user-specified \(Y\) and \(X\) variables in a particular dataset (default mtcars):
f.reg <- function(data = mtcars, y, x, log = FALSE) {
if (log) {
formula <- paste("log(", y, ") ~ ", x, sep = "")
} else {
formula <- paste(y, " ~ ", x, sep = "")
}
summary.fit <- summary(lm(formula, data = data))
return(list(r2 = summary.fit$r.squared, p = summary.fit$coef[2, 4]))
}
And suppose you want to regress gas mileage (mpg
) on 3 separate predictors (wt
= weight, hp
= horsepower, qsec
= 1/4 mile time), with and without log-transforming mpg
in each case. To fit these 6 models via iterate, you can run:
library("dvmisc")
iterate(f.reg, y = "mpg", x = c("wt", "hp", "qsec"), log = c(TRUE, FALSE))
## y x log r2 p
## 1: mpg wt TRUE 0.7976 6.31e-12
## 2: mpg wt FALSE 0.7528 1.294e-10
## 3: mpg hp TRUE 0.6233 7.853e-08
## 4: mpg hp FALSE 0.6024 1.788e-07
## 5: mpg qsec TRUE 0.1728 0.01796
## 6: mpg qsec FALSE 0.1753 0.01708
Or with pipes:
f.reg %>% iterate(y = "mpg", x = c("wt", "hp", "qsec"), log = c(TRUE, FALSE))
## y x log r2 p
## 1: mpg wt TRUE 0.7976 6.31e-12
## 2: mpg wt FALSE 0.7528 1.294e-10
## 3: mpg hp TRUE 0.6233 7.853e-08
## 4: mpg hp FALSE 0.6024 1.788e-07
## 5: mpg qsec TRUE 0.1728 0.01796
## 6: mpg qsec FALSE 0.1753 0.01708
You could get the same thing with pmap, but it takes a little more work, and this particular solution doesn’t make it obvious what each set of inputs is:
library("purrr")
do.call(rbind, pmap(.l = list(y = rep("mpg", 6),
x = rep(c("wt", "hp", "qsec"), each = 2),
log = rep(c(TRUE, FALSE), 3)),
.f = f.reg))
## r2 p
## [1,] 0.7976 6.31e-12
## [2,] 0.7528 1.294e-10
## [3,] 0.6233 7.853e-08
## [4,] 0.6024 1.788e-07
## [5,] 0.1728 0.01796
## [6,] 0.1753 0.01708
You can also pass non-scalar inputs that you want iterate to treat as fixed through the fix
argument. For example, to fit Sepal.Length
vs. Sepal.Width
and Petal.Length
in the iris
dataset:
f.reg %>% iterate(y = "Sepal.Length", x = c("Sepal.Width", "Petal.Length"),
fix = list(data = iris))
## y x r2 p
## 1: Sepal.Length Sepal.Width 0.01382 0.1519
## 2: Sepal.Length Petal.Length 0.76 1.039e-47
For simulations, you typically want to run multiple trials of several different scenarios. You can do that with iterate pretty easily. Suppose you want to calculate empirical power of the one-sample t-test in various scenarios. Your “one-trial” function might be something like:
f.power <- function(n = 50, mu = 0, sigsq = 1, mu0 = 0, alpha = 0.05) {
x <- rnorm(n = n, mean = mu, sd = sqrt(sigsq))
fit <- t.test(x, mu = mu0)
return(list(t = fit$statistic, p = fit$p.value))
}
To run 2 trials for three different true means and two different sample sizes:
f.power %>% iterate(mu = c(0.1, 0.25, 0.5), n = c(25, 50), trials = 2)
## mu n t p
## 1: 0.10 25 -1.025 0.3158
## 2: 0.10 25 0.5205 0.6075
## 3: 0.10 50 1.314 0.1951
## 4: 0.10 50 0.7288 0.4696
## 5: 0.25 25 0.6227 0.5394
## 6: 0.25 25 3.049 0.005525
## 7: 0.25 50 1.716 0.09254
## 8: 0.25 50 1.427 0.1601
## 9: 0.50 25 0.3716 0.7134
## 10: 0.50 25 2.414 0.02378
## 11: 0.50 50 3.22 0.00228
## 12: 0.50 50 4.774 1.673e-05
Ramping it up to 1,000 trials each, and actually calculate the empirical power afterwards:
f.power %>% iterate(mu = c(0.1, 0.25, 0.5), n = c(25, 50), trials = 1000) %>%
dplyr::group_by(mu, n) %>%
dplyr::summarise(empirical.power = mean(p < 0.05))
## # A tibble: 6 x 3
## # Groups: mu [?]
## mu n empirical.power
## <dbl> <dbl> <dbl>
## 1 0.100 25.0 0.0700
## 2 0.100 50.0 0.111
## 3 0.250 25.0 0.225
## 4 0.250 50.0 0.390
## 5 0.500 25.0 0.660
## 6 0.500 50.0 0.929
Miraculously, we find that power increases with sample size and effect size. Biometrics here we come!
Henry, Lionel, and Hadley Wickham. 2017. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Van Domelen, Dane R. 2018. Dvmisc: Convenience Functions, Moving Window Statistics, and Graphics.