There are quite a few R functions/packages for calculating moving averages. The purpose of this article is to compare a bunch of them and see which is fastest.
Here are the 10 functions I’ll be looking at, in alphabetical order (Disclaimer: the accelerometry package is mine).
First, here is the syntax for each function:
# Sample 1,000 values from Poisson(3)
set.seed(1)
x <- rpois(n = 1000, lambda = 3)
# Calculate 5-unit moving average using each of the functions
width <- 5
y.filter <- filter(x = x, filter = rep(1, width)) / width
y.filter <- y.filter[! is.na(y.filter)]
library("forecast")
y.ma <- ma(x = x, order = width)
y.ma <- y.ma[! is.na(y.ma)]
library("pracma")
y.movavg <- movavg(x = x, n = width, type = "s")
y.movavg <- y.movavg[-c(1: (width - 1))]
library("accelerometry")
y.movingaves <- movingaves(x = x, window = width)
library("RcppRoll")
y.roll_mean <- roll_mean(x = x, n = width)
library("zoo")
y.rollapply <- rollapply(data = x, width = width, FUN = mean)
y.rollmean <- rollmean(x = x, k = width)
library("caTools")
y.runmean <- runmean(x = x, k = width, endrule = "trim")
library("TTR")
y.runMean <- runMean(x = x, n = width)
y.runMean <- y.runMean[! is.na(y.runMean)]
y.SMA <- SMA(x = x, n = width)
y.SMA <- y.SMA[! is.na(y.SMA)]
There were some minor differences among functions, i.e. whether they have extra NA’s in front or back, but I took care of that in the above code. All methods indeed give the same results:
# Combine all of the moving average vectors into a matrix
y.all <- cbind(y.filter, y.ma, y.movavg, y.movingaves, y.roll_mean,
y.rollapply, y.rollmean, y.runmean, y.runMean, y.SMA)
# Other than a small precision issue, all moving averages are the same
max(apply(y.all, 1, function(x) diff(range(x))))
## [1] 8.881784e-16
Let’s see which gives the fastest performance over 500 trials:
library("rbenchmark")
(test1 <- benchmark(filter(x = x, filter = rep(1, width)) / width,
ma(x = x, order = width),
movavg(x = x, n = width, type = "s"),
movingaves(x = x, window = width),
roll_mean(x = x, n = width),
rollapply(data = x, width = width, FUN = mean),
rollmean(x = x, k = width),
runmean(x = x, k = width, endrule = "trim"),
runMean(x = x, n = width),
SMA(x = x, n = width),
replications = 500, order = "elapsed",
columns = c("test", "replications", "elapsed")))
## test replications elapsed
## 4 movingaves(x = x, window = width) 500 0.02
## 5 roll_mean(x = x, n = width) 500 0.02
## 2 ma(x = x, order = width) 500 0.04
## 8 runmean(x = x, k = width, endrule = "trim") 500 0.05
## 1 filter(x = x, filter = rep(1, width))/width 500 0.06
## 9 runMean(x = x, n = width) 500 0.20
## 10 SMA(x = x, n = width) 500 0.20
## 7 rollmean(x = x, k = width) 500 0.37
## 6 rollapply(data = x, width = width, FUN = mean) 500 0.56
## 3 movavg(x = x, n = width, type = "s") 500 3.53
I’ll do another quick test on a vector of non-integer values:
# Sample 1,000 values from N(0, 1)
set.seed(1)
x <- rnorm(n = 1000)
(test2 <- benchmark(filter(x = x, filter = rep(1, width)) / width,
ma(x = x, order = width),
movavg(x = x, n = width, type = "s"),
movingaves(x = x, window = width),
roll_mean(x = x, n = width),
rollapply(data = x, width = width, FUN = mean),
rollmean(x = x, k = width),
runmean(x = x, k = width, endrule = "trim"),
runMean(x = x, n = width),
SMA(x = x, n = width),
replications = 500, order = "elapsed",
columns = c("test", "replications", "elapsed")))
## test replications elapsed
## 4 movingaves(x = x, window = width) 500 0.01
## 5 roll_mean(x = x, n = width) 500 0.02
## 2 ma(x = x, order = width) 500 0.04
## 1 filter(x = x, filter = rep(1, width))/width 500 0.05
## 8 runmean(x = x, k = width, endrule = "trim") 500 0.07
## 10 SMA(x = x, n = width) 500 0.19
## 9 runMean(x = x, n = width) 500 0.21
## 7 rollmean(x = x, k = width) 500 0.42
## 6 rollapply(data = x, width = width, FUN = mean) 500 0.50
## 3 movavg(x = x, n = width, type = "s") 500 3.50
It looks like movavg, rollapply, and rollmean are slower than the others, so let’s disqualify those three and continue to another scenario: a longer vector and a longer window length.
# Sample 50,000 values from N(0, 1)
set.seed(1)
x <- rnorm(n = 50000)
width <- 500
(test3 <- benchmark(filter(x = x, filter = rep(1, width)) / width,
ma(x = x, order = width),
movingaves(x = x, window = width),
roll_mean(x = x, n = width),
runmean(x = x, k = width, endrule = "trim"),
runMean(x = x, n = width),
SMA(x = x, n = width),
replications = 500, order = "elapsed",
columns = c("test", "replications", "elapsed")))
## test replications elapsed
## 3 movingaves(x = x, window = width) 500 0.22
## 5 runmean(x = x, k = width, endrule = "trim") 500 1.09
## 6 runMean(x = x, n = width) 500 2.11
## 7 SMA(x = x, n = width) 500 2.15
## 4 roll_mean(x = x, n = width) 500 11.87
## 2 ma(x = x, order = width) 500 45.40
## 1 filter(x = x, filter = rep(1, width))/width 500 45.71
Here movingaves looks like the clear winner. It’s interesting that roll_mean fell so far in this test. It seems to become progressively less efficient as the window size grows. Actually, let’s look directly at how window length affects speed for a fixed vector length.
# Look at effect of window size on speed of various functions
x <- rpois(n = 10000, lambda = 5)
width.vals <- c(2, 5, 10, 100, 1000, 5000)
benchmark.results <- matrix(NA, ncol = 7, nrow = length(width.vals))
for (ii in 1: length(width.vals)) {
width.ii <- width.vals[ii]
benchmark.results[ii, ] <-
benchmark(filter(x = x, filter = rep(1, width.ii)) / width.ii,
ma(x = x, order = width.ii),
movingaves(x = x, window = width.ii),
roll_mean(x = x, n = width.ii),
runmean(x = x, k = width.ii, endrule = "trim"),
runMean(x = x, n = width.ii),
SMA(x = x, n = width.ii),
replications = 250)$elapsed
}
# Create figure
cols <- c("blue", "red", "orange", "darkred", "purple", "green", "black")
par(mfrow = c(1, 2))
plot(benchmark.results[, 1],
main = "Benchmarking: Time vs. Window Width",
ylab = "Elapsed time, 250 trials (s)",
xlab = "Window width (units)",
ylim = c(0, max(benchmark.results)),
type = "n",
xaxt = "n")
axis(side = 1, at = 1: 6, labels = width.vals)
for (ii in 1: 7) {
points(x = benchmark.results[, ii], type = "b", col = cols[ii])
}
legend("topleft", lty = 1, col = cols, cex = 0.8,
legend = c("filter", "ma", "movingaves", "roll_mean", "runmean",
"runMean", "SMA"))
# Create zoomed-in version
plot(benchmark.results[, 1],
main = "Zoomed in",
ylab = "Elapsed time, 250 trials (s)",
xlab = "Window width (units)",
ylim = c(0, 0.8),
type = "n",
xaxt = "n")
axis(side = 1, at = 1: 6, labels = width.vals)
for (ii in 1: 7) {
points(x = benchmark.results[, ii], type = "b", col = cols[ii])
}
legend("topleft", lty = 1, col = cols, cex = 0.8,
legend = c("filter", "ma", "movingaves", "roll_mean", "runmean",
"runMean", "SMA"))
Note that the x-axis is not to scale; the results were easier to visualize this way. Overall, movingaves and roll_mean are similarly fast for relatively small window widths, but movingaves is easily the fastest when the window width gets larger.
For one last analysis, let’s see how the length of the vector affects the results, holding the window width fixed at 5 units. I’ll move back to 500 trials for these simulations, as they’re somewhat faster.
# Look at effect of vector length on speed of various functions
length.vals <- c(10, 100, 1000, 10000, 100000)
width <- 5
benchmark.results <- matrix(NA, ncol = 7, nrow = length(length.vals))
for (ii in 1: length(length.vals)) {
length.ii <- length.vals[ii]
x <- rpois(n = length.ii, lambda = 5)
benchmark.results[ii, ] <- benchmark(filter(x = x, filter = rep(1, width)) / width,
ma(x = x, order = width),
movingaves(x = x, window = width),
roll_mean(x = x, n = width),
runmean(x = x, k = width, endrule = "trim"),
runMean(x = x, n = width),
SMA(x = x, n = width),
replications = 500)$elapsed
}
# Create figure
cols <- c("blue", "red", "orange", "darkred", "purple", "green", "black")
par(mfrow = c(1, 2))
plot(benchmark.results[, 1],
main = "Benchmarking: Time vs. Vector Length",
ylab = "Elapsed time, 500 trials (s)",
xlab = "Vector length (units)",
ylim = c(0, max(benchmark.results)),
type = "n",
xaxt = "n")
axis(side = 1, at = 1: 5, labels = length.vals, cex.axis = 0.8)
for (ii in 1: 7) {
points(x = benchmark.results[, ii], type = "b", col = cols[ii])
}
legend("topleft", lty = 1, col = cols, cex = 0.8,
legend = c("filter", "ma", "movingaves", "roll_mean", "runmean",
"runMean", "SMA"))
# Create zoomed-in version
plot(benchmark.results[, 1],
main = "Zoomed in",
ylab = "Elapsed time, 500 trials (s)",
xlab = "Vector length (units)",
ylim = c(0, 1),
type = "n",
xaxt = "n")
axis(side = 1, at = 1: 5, labels = length.vals, cex.axis = 0.8)
for (ii in 1: 7) {
points(x = benchmark.results[, ii], type = "b", col = cols[ii])
}
legend("topleft", lty = 1, col = cols, cex = 0.8,
legend = c("filter", "ma", "movingaves", "roll_mean", "runmean",
"runMean", "SMA"))
Here, movingaves and roll_mean are again the two fastest, with movingaves achieving a small speed advantage as the vector length gets larger. It’s interesting to see that the other 5 functions really drop off in performance for large vectors.
In summary, surprise surprise, I’m going to recommend movingaves in my accelerometry package for calculating moving averages efficiently in R.