Map over data frame columns, apply function to data if column meets condition

Question

I'm pulling data from the Google Analytics API, processing it locally, then knitting an .Rmd file into text, tables, and visualisations. As part of the knitting/tabling process, I'm doing some basic formatting (e.g. rounding off percentages and adding % signs).

For this question, I have toPercent(), which works fine if used like this:

toPercent <- function(percentData){
    percentData <- round(data, 2)
    percentData <- mapply(toString, percentData)
    percentData <- paste(percentData, "%", sep="")
}

devices <- toPercent(devices$avgSessionDuration)

However, manually setting the function for every table is time-intensive. I created the percentCheck() to look for columns that matched my criteria:

percentCheck <- function(data){
    data[,grep("rate|percent", names(data), ignore.case=TRUE)] <- toPercent(data[,grep("rate|percent", names(data), ignore.case=TRUE)])
}

devices <- percentCheck(devices)

But I know this doesn't work on a dataset with multiple matches (e.g. a column for exitRate and a column for bounceRate).

Q1: Have I written toPercent() in a way that won't return multiple values to one entry?

Q2: How can I structure percentCheck() to map over the dataset and only apply toPercent() if the column name includes a given string?

Version/Packages:

R version 3.1.1 (2014-07-10) -- "Sock it to Me"
library(rga)
library(knitr)
library(stargazer)

Data:

> dput(devices)
structure(list(deviceCategory = c("desktop", "mobile", "tablet"
), sessions = c(817, 38, 1540), avgSessionDuration = c(153.424888853179, 
101.942758538617, 110.270988142292), bounceRate = c(39.0192297391397, 
50.2915625371891, 50.1343873517787), exitRate = c(25.3257456030279, 
32.0236280487805, 29.0991902834008)), .Names = c("deviceCategory", 
"sessions", "avgSessionDuration", "bounceRate", "exitRate"), row.names = c(NA, 
-3L), class = "data.frame")

talat · Accepted Answer · 2014-08-20T15:00:02.943

2

How about this modification:

percentCheck <- function(data){
  idx  <- grepl("rate|percent", names(data), ignore.case=TRUE)
  data[idx] <- lapply(data[idx], function(x) paste0(sprintf("%.2f", round(x,2)), "%"))
  return(data)
}

Here, I first used grepl to create and index of columns which meet the specified criteria. Then, this index is used in lapply to apply it to all these columns and the function that is applied is similar to your toPercent function, only I found it a bit more compact like this.

Now you can apply it to your whole data set in one go:

percentCheck(devices)
#  deviceCategory sessions avgSessionDuration bounceRate exitRate
#1        desktop      817           153.4249     39.02%   25.33%
#2         mobile       38           101.9428     50.29%   32.02%
#3         tablet     1540           110.2710     50.13%   29.10%

edited Aug 20 '14 at 15:00

answered Aug 20 '14 at 14:53

talat

68,970
21
126
157

Thank you! I'm building more formatting function for different GA data types, and this seems to be accepting those other functions without a hitch. Speaking of, do you know of any good beginner's tutorials for *apply() in R? – mattpolicastro Aug 20 '14 at 15:02
You can definitely have a look here: http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega/7141669#7141669 and for more, I would just google and/or check out the basic R tutorials like http://cran.r-project.org/doc/manuals/R-intro.pdf. Personally, I would advise to start learning `apply`, `sapply` and `lapply` and only after that the others like `mapply` `vapply` `tapply` etc. – talat Aug 20 '14 at 15:07

Map over data frame columns, apply function to data if column meets condition

1 Answers1