Multiple data.frame subgroups processing

Question

I need to process three data frames containing the same subgroups indexed by name. That is, the first data frame df1 looks like this:

Name      col1        col2
Car       94.56       1
Car       52.67       2
Bike      421.5       2
Bike      34.56       4

df2 and df3 have the same Name column with the same values, only different columns. I need to process all the rows in the 3 data frames, for each different name. So far I've been using this approach:

results = data.frame(name = factor("dummy"), col1 = 1, col2 = 2) 
for( name in df1$Name ) {
  new.results = process(name, df1[df1$Name == name, ], df2[df2$Name == name, ], df3[df3$Name == name, ]
  results = rbind(results, new.results)
}

return(results)

Here process() returns another data frame with the results of some calculation. The problem with this code is that process() must return the same layout than the 'results' data frame. If I change the contents returned by process() I also have to change 'results'. Also the first row in the results data frame must be removed.

Is there an easier way to do this? by() can group 1 data frame by name and invoke process() for each subgroup, but I can't pass in the df2 and df3 subgroups.

What is `process()`? Could you give us an example if `df2` also? As there are duplicates in `Name`, I am not sure if you want to merge dfs based on only that variable. — daroczig, Dec 13 '11 at 13:21
Is this different than the question you asked on Cross Validated? If not, we should close one of the two and it looks like you found a suitable answer over there: http://stats.stackexchange.com/questions/19731/r-process-data-frame-subgroups-and-merge-results-together — Chase, Dec 13 '11 at 14:30
process() is my own function that returns a data frame. Each new.results will contain 1 row for a specific name, as returned by the df1, df2 and df3 name queries (df1[df1$Name == name, ]). — Robert Kubrick, Dec 13 '11 at 14:30
@Chase: yes, please close the question on Crossvalidated. I don't think I can do that myself. — Robert Kubrick, Dec 13 '11 at 14:32
@Robert - I can't directly close it either. It would be helpful to know why the answer provided over there doesn't answer your question sufficiently. You'll get the best help if you provide a small, reproducible question so that others can simply copy/paste your code and see it run, or see where it fails to do what you want. This question has several good tips on providing good questions: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Chase, Dec 13 '11 at 14:57

score 2 · Accepted Answer · edited Dec 13 '11 at 16:53

I would look at ddply and plyr by wickham. they sound right up you ally.

http://svitsrv25.epfl.ch/R-doc/library/plyr/html/ddply-5k.html

The basic idea is to split a data frame on a criteria. Unique name in your case, apply a function to each group(either a canned R function like sum or a custom home cooked one) and then it stiches them all back together.

I don't really understand the different dataframes but you may find more luck with lapply. you can build a function that returns a data frame for each group. Call it with output<- lapply(X = as.list(the_list_of_unique_groups), FUN = your_function_for_each_group)

and stitch it back together with results <- do.call("rbind", output)

This combination is incredible helpful. Good luck.

Multiple data.frame subgroups processing

1 Answers1