0

I have two data frames in R. data, a frame with monthly sales per department in a store, looks like this:

dataframe named data

While averages, a frame with the average sales over all months per department, looks like this:

dataframe named averages

What I'd like to do is add a column to data containing the average sales (column 3 of averages) for each department. So whereas now I have an avg column with all zeroes, I'd like it to contain the overall average sales for whatever department is listed in that row. This is the code I have now:

for(j in 1:nrow(avgs)){
  for(i in 1:nrow(data)){
    if(identical(data[i,4], averages[j,1])){
      gd[i,10] <- avgs[j,3] } } }

After running the loop, the avg column in data is still all zeroes, which makes me think that if(identical(data[i,4], averages[j,1])) is always evaluating to FALSE... But why would this be? How can I troubleshoot this issue / is there a better way to do this?

boop
  • 59
  • 5
  • Do not post your data as an image, please learn how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Nov 10 '16 at 07:46
  • @ProcrastinatusMaximus Now that I have the answer I see how it's related to the other question. Since I didn't know the name of the function was merge the other thread didn't come up in my searches, and I didn't really have enough of a mathematical understanding at the time of what I was trying to accomplish to think of it in terms of the join. ^.^ – boop Dec 05 '16 at 14:49

2 Answers2

0

Are you looking for merge function?

merge(x = data, y = avgs, by = "departmentName", all.x=TRUE)
MFR
  • 2,049
  • 3
  • 29
  • 53
  • for some reason I'm getting a stack error when I try this method: `Error: C stack usage 19925264 is too close to the limit`. The dataframes are not huge so I don't know why that would be. Sorry, my fat fingers hit enter too soon – boop Nov 10 '16 at 06:26
  • I believe this error is because of the deep recursion of the previous code and if you try any other could, you would see exact same error. Isn't it true? if you, perhaps you want to restart r. and please post the data instead of the image to allow people reproduce your example – MFR Nov 10 '16 at 06:32
  • 1
    Right on the head about the restart. Ran perfect after that. Thank you! – boop Nov 10 '16 at 06:41
-2

I would use dplyr by doing:

dplyr::full_join(data, averages, by = "departmentName")

The great thing about dplyr (besides being fast) is that it has a very simple syntax. Moreover, if your two tables have variables with different names, that can also be specified. Imagine you have data_departmentName in table data and avgs_departmentName in the table averages:

dplyr::full_join(data, averages, by = c("data_departmentName" = "averages_departmentName"))

And then I would filter the dataset if you only want a specific column from the second dataset. If you know your data is ordered and has the same lenght, then you could just add it like:

data$avgs <- averages$avgs

But I'd rather join first then filter.