0

How do I multiply values in a larger dataframe based on values in another dataframe, according to the category, while working in R?

The smaller data frame is like so:

travel time
bike 2.5
car 1.0
walk 5.0

The larger dataframe is like so:

year travel rate
2010 bike 25
2010 car 70
2010 walk 9.2
2011 car 71
2011 walk 8.2
2010 bike 23.1

Then I want a new column in the second dataframe as the product (rate * time = distance in this example), according on the 'travel' category based on each one's specific rate from the first dataframe.

r2evans
  • 141,215
  • 6
  • 77
  • 149
SCE
  • 1
  • 1
  • 3
    Just do `library(data.table);setDT(df2)[df1, distance := rate * time, on = .(travel)]` or with `dplyr` `df2 %>% left_join(df1 %>% select(-year)) %>% mutate(distance = rate * time)` – akrun Sep 01 '21 at 00:36

1 Answers1

1

This is a merge/join operation. Two good references for those are How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?. (As such, this question is mostly a duplicate of the first one.)

base R

both <- merge(larger, smaller, by = "travel", all.x = TRUE)
transform(both, distance = rate*time )
#   travel year rate time distance
# 1   bike 2010 25.0  2.5     62.5
# 2   bike 2010 23.1  2.5     57.8
# 3    car 2010 70.0  1.0     70.0
# 4    car 2011 71.0  1.0     71.0
# 5   walk 2010  9.2  5.0     46.0
# 6   walk 2011  8.2  5.0     41.0

dplyr

library(dplyr)
left_join(larger, smaller, by = "travel") %>%
  mutate(distance = rate*time)
#   year travel rate time distance
# 1 2010   bike 25.0  2.5     62.5
# 2 2010    car 70.0  1.0     70.0
# 3 2010   walk  9.2  5.0     46.0
# 4 2011    car 71.0  1.0     71.0
# 5 2011   walk  8.2  5.0     41.0
# 6 2010   bike 23.1  2.5     57.8

data.table

(as akrun commented)

library(data.table)
smallerDT <- as.data.table(smaller)
largerDT <- as.data.table(larger)
largerDT[smallerDT, distance := rate*time, on = .(travel)]
#     year travel  rate distance
#    <int> <char> <num>    <num>
# 1:  2010   bike  25.0     62.5
# 2:  2010    car  70.0     70.0
# 3:  2010   walk   9.2     46.0
# 4:  2011    car  71.0     71.0
# 5:  2011   walk   8.2     41.0
# 6:  2010   bike  23.1     57.8

Data

smaller <- structure(list(travel = c("bike", "car", "walk"), time = c(2.5, 1, 5)), class = "data.frame", row.names = c(NA, -3L))
larger <- structure(list(year = c(2010L, 2010L, 2010L, 2011L, 2011L, 2010L), travel = c("bike", "car", "walk", "car", "walk", "bike"), rate = c(25, 70, 9.2, 71, 8.2, 23.1)), class = "data.frame", row.names = c(NA, -6L))
r2evans
  • 141,215
  • 6
  • 77
  • 149