Create new dataframe column in R that conditions on row values without iterating?

Question

So let's say I have the following dataframe "df":

names <- c("Bob","Mary","Ben","Lauren")
number <- c(1:4)
age <- c(20,33,34,45)
df <- data.frame(names,number,age)

Let's say I have another dataframe ("df2") with thousands of people and I want to sum the income of people in that other dataframe that have the given name, number and age of each row in "df". That is, for each row "i" of "df", I want to create a fourth column "TotalIncome" that is the sum of the income of all the people with the given name, age and number in dataframe "df2". In other words, for each row "i":

df$TotalIncome[i] <- sum(
  df2$Income[df2$Name == df1$Name[i] &
  df2$Numbers == df1$Numbers[i] &
  df2$Age == df1$Age[i]], na.rm=TRUE)

Is there a way to do this without having to iterate in a for loop for each row "i" and perform the above code? Is there a way to use apply() to calculate this for the entire vector rather than only iterating each line individually? The actual dataset I am working with is huge and iterating takes quite a while and I am hoping there is a more efficient way to do this in R.

Thanks!

Yes. First you'll need to merge/join the second frame onto the first ([ref1](https://stackoverflow.com/q/1299871/3358272), [ref2](https://stackoverflow.com/q/5706437/3358272)), then summarize. No iteration required. If you had a sample of `df2` we might be able to help. (Even with `df2`, it'll likely be a dupe of those first two refs, plus [summarize by group](https://stackoverflow.com/q/11562656/3358272).) — r2evans, Dec 16 '21 at 21:21

score 0 · Accepted Answer · answered Dec 16 '21 at 23:43

Have you considered use dplyr package? You can use some grammar with SQL-style and make this job quick and easy.

The code will be something like

library(dplyr)

df %>% left_join(df2) %>%
    group_by(name, numbers, age) %>%
    summarize(TotalIncome = sum(Income))

I suggest you to find the cheat sheets available on dplyr site or see the Wickham and Grolemund book.

Create new dataframe column in R that conditions on row values without iterating?

1 Answers1