Multiplying columns of different size of 2 data frames

Question

I have two different data frames in R. The first one, df1, contains data for a bunch of cities, each associated with it's corresponding state. The second one contains data aggregated by state, but dividing this data into seral classes. Like this:

states1 <- c("a", "a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "d", "d", "d")
cities <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N")
data1 <- c(123, 222, 444, 125, 687, 987, 556, 445, 444, 659, 798, 113, 325, 144) 
df1 <- data.frame(states1, cities, data1)
#
states2 <- c("a","a","b","b","c","c","d","d")
classes <- c(1,2,1,2,1,2,1,2)
data2 <- c(65,21,44,25,37,87,58,47)
df2 <- data.frame(states2, classes, data2)

I want to multiply the data columns from the two data frames based on a certain criteria creating a third data frame. I want, for each city, to multiply its data to the data of its corresponding state, creating two columns, one for each class.

For example:

For cities A and B, which belong to state a, and K, belonging to sate c, I need to multiply their data by the data of their respective states, and do that for each of the classes 1 and 2. That is, I want to multiply the city's data by the two classes' data using the corresponding states as my matching criteria. like that, for ex.:

multA <- c(123*65, 123*21)
multB <- c(222*65, 222*21)
multK <- c(798*37, 789*87)

df3 <- data.frame(rbind(multA, multB, multK))
colnames(df3) <- c("class 1", "class 2")

But of course, I want to do that automatically and for every city. I've tried to use the which function and the dplyr package, but so far I haven't come up with a solution. Is there any way I can do that without with a package or built in function, without needing writing an explicit loop? Thanks!!

`transform(merge(df1, df2, by.x = "states1", by.y = "states2"), v = data1 * data2)`? — Frank, Sep 24 '18 at 18:50

StatsStudent · Answer 1 · 2018-09-24T19:49:27.713

There are a number of ways to "skin this cat," but I'm partial to sql so I'll use the sqldf along with the reshape2 package to accomplish this:

library("sqldf")
#Execute query to merge df1 to df2 based on state, 
#extracting the city name, classes, and performing the necessary math
df4<-sqldf("select 
                   a.cities, 
                   b.classes, 
                   a.data1*data2 as class1 
            from 
                   df1 a join df2 b 
            where 
                   a.states1=b.states2"
           )

#Since the data is now in "long form" we need to reshape it 
#so that for each class, you create a column, and the city becomes the row
library(reshape2)
df4_reshaped<-dcast(df4, formula=cities~classes, value.var="class1")
names(df4_reshaped)<-c("city", "class 1", "class 2")

This results in a dataframe that has the city as a column name (which I personally find more useful than having it as a rowname as you have in df3):

> df4_reshaped
   city class 1 class 2
1     A    7995    2583
2     B   14430    4662
3     C   28860    9324
4     D    8125    2625
5     E   44655   14427
6     F   43428   24675
7     G   24464   13900
8     H   19580   11125
9     I   19536   11100
10    J   24383   57333
11    K   29526   69426
12    L    6554    5311
13    M   18850   15275
14    N    8352    6768

To get the exact shape you were asking for, you'll need to set the rownames and remove the city column:

#set rowname
rownames(df4_reshaped)<-paste("mult", df4_reshaped$city, sep="")

#remove city column
df4_reshaped2<-df4_reshaped[,-1]

So your final output is now:

> df4_reshaped2
      class 1 class 2
multA    7995    2583
multB   14430    4662
multC   28860    9324
multD    8125    2625
multE   44655   14427
multF   43428   24675
multG   24464   13900
multH   19580   11125
multI   19536   11100
multJ   24383   57333
multK   29526   69426
multL    6554    5311
multM   18850   15275
multN    8352    6768

score 0 · Answer 2 · answered Sep 24 '18 at 19:44

Here is a solution using dplyr:

df1 %>% 
  left_join(df2, by = c('states1' = 'states2')) %>%
  mutate(vals = data1 * data2) %>%
  group_by(states1, cities) %>%
  summarise(`class 1` = vals[classes == 1],
            `class 2` = vals[classes == 2])

# A tibble: 14 x 4
# Groups:   states1 [?]
   states1 cities `class 1` `class 2`
   <fct>   <fct>      <dbl>     <dbl>
 1 a       A           7995      2583
 2 a       B          14430      4662
 3 a       C          28860      9324
 4 a       D           8125      2625
 5 a       E          44655     14427
 6 b       F          43428     24675
 7 b       G          24464     13900
 8 b       H          19580     11125
 9 b       I          19536     11100
10 c       J          24383     57333
11 c       K          29526     69426
12 d       L           6554      5311
13 d       M          18850     15275
14 d       N           8352      6768

Staying within the `tidyverse` you can `library(dplyr) library(tidyr) df3 <- df1 %>% inner_join(df2, by = c("states1" = "states2")) %>% mutate(Mult = data1 * data2) %>% select(-starts_with("data")) %>% mutate(classes = paste0("class ", as.character(classes))) %>% spread(classes, Mult)` — Kerry Jackson, Sep 24 '18 at 20:24

score 0 · Answer 3 · answered Sep 24 '18 at 21:02

To extend @Frank's base R solution in comments, consider reshape of df2 and then merge the two sets with a transform for product of numeric columns:

# RESHAPE LONG TO WIDE
rdf2 <- reshape(df2, idvar="states2", v.names="data2", timevar="classes", direction="wide")
# RENAME COLUMNS
rdf2 <- setNames(rdf2, c("states2", "class1", "class2"))

# MERGE AND TRANSFORM            
final_df <- transform(merge(df1, rdf2, by.x="states1", by.y="states2"),
                      class1 = data1 * class1,
                      class2 = data1 * class2)

final_df

#    states1 cities data1 class1 class2
# 1        a      A   123   7995   2583
# 2        a      B   222  14430   4662
# 3        a      C   444  28860   9324
# 4        a      D   125   8125   2625
# 5        a      E   687  44655  14427
# 6        b      F   987  43428  24675
# 7        b      G   556  24464  13900
# 8        b      H   445  19580  11125
# 9        b      I   444  19536  11100
# 10       c      J   659  24383  57333
# 11       c      K   798  29526  69426
# 12       d      L   113   6554   5311
# 13       d      M   325  18850  15275
# 14       d      N   144   8352   6768

Multiplying columns of different size of 2 data frames

3 Answers3

Linked