1

I've been beating my head against a wall on this one all night. I've got multiple data frames, each with the same general column structure. For each data frame, the columns carry a unique "_suffix". There are matching rows across the data frames. Here is an example:

cat_list <- c("mpg","spd","wt","price")

car <- c("bmw","mbz","lexus","bmw","mbz","lexus")
src <- c("usa","usa","usa","gb","gb","gb")
mpg_usa <- c(5,11,34,0,0,0)
mpg_gb <- c(0,0,0,44,12,9)
spd_usa <- c(55,13,7,0,0,0)
spd_gb <- c(0,0,0,15,43,67)

df <- data.frame(car, src, mpg_usa, mpg_gb, spd_usa, spd_gb)

I'd like to calculate the average for the matched rows based on a list of columns I specify. For example, my list of candidate categories is cat_list. The data frame contains mpg and spd. I'd like to calculate the mean speed and mpg individually for the three car types across GB and the USA.

I've tried a few different version of melt with no success. I figured out how to search for my column names using this:

avail.cats <- names(df)[grepl(paste(cat_list, collapse = "|"), names(df))]

However, that obviously doesn't help me collapse for calculation purposes.

dra
  • 47
  • 5

1 Answers1

2

We can use melt from the devel version of data.table. We subset the column names that are found in the 'cat_list' using intersect after we removing the substring in column names that starts with _. We can specify multiple patterns in the measure argument of melt after converting the 'data.frame' to 'data.table' (setDT(df))

library(data.table)#v1.9.5+
nm1 <- intersect(sub('_.*', '', names(df)), cat_list)
dM <- melt(setDT(df), measure=patterns(paste0('^', nm1)), value.name=nm1)

Using the melted dataset, we group by 'car' and 'src', specify the columns in 'dM' that are common with 'cat_list' in .SDcols and loop with lapply to get the mean.

dM[,lapply(.SD, mean) , .(car, src), .SDcols= nm1]
#     car src  mpg  spd
#1:   bmw usa  2.5 27.5
#2:   mbz usa  5.5  6.5
#3: lexus usa 17.0  3.5
#4:   bmw  gb 22.0  7.5
#5:   mbz  gb  6.0 21.5
#6: lexus  gb  4.5 33.5

NOTE: Instructions to install the devel version of data.table are here

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Dumb follow-up -- but how do I install the devel version of `data.table`? I have 1.9.4. – dra Aug 27 '15 at 05:23
  • I think this is pretty close. The code does work for the sample data, but I don't think it quite fits my need. In the `melt` statement, 'mpg' and 'spd' are hard-coded in. What I'd like to do is have the function look in the 'cat_list' to find the matching categories (in this case, it should find mpg and spd, but not wt or price). I have a long list of category variables, and they won't all be present at any given time. The fields in the dataset will always be a permutation of (a subset of) 'cat_list' and (all of) 'src'. – dra Aug 27 '15 at 13:44
  • @dra I updated the post based on your comments. – akrun Aug 27 '15 at 13:51