0

I have dataframe tech_change_supply that looks like this:

supplysector       tech_change_all   tech_change_elec    tech_change_no_elec
aluminum                  NA            .26                  .77      
cement                    NA            .86                  .92
chemicals                 NA           1.32                   NA
food processing           NA            .39                  .70
iron and steel            NA            .02                  .58
other manufacturing       NA           1.11                 1.04
other nonmetallic         NA            .41                  .75
pulp paper wood           NA            .96                   NA

and dataframe tech_change_service that looks like this:

subsector       tech_change_all   tech_change_elec    tech_change_no_elec
boilers                0                0.0                1.0
boilers_CHP            0                0.0                1.0
electrochemical        0                1.0                0.0
feeedstocks            0                0.0                0.0
machine drive          0                1.0                0.0
process heat           0                0.0                1.0
other uses             0                0.5                0.5

I need to combine these two dataframes in a way so that there is a supplysector column from tech_change_supply and a subsector column from tech_change_service, where each supplysector has every subsector (i.e. "aluminum" is repeated 7 times with each subsector matched to it once). Additionally, I need to maintain the tech_change_all, tech_change_elec, and tech_change_no_elec, and multiply these using the values from each supplysector, with the exception being "other uses" where I need the average of tech_change_elec and tech_change_no_elec. For example:

supplysector   subsector          tech_change_all   tech_change_elec    tech_change_no_elec
aluminum        boilers                  0                 0                  .77
aluminum        boilers_CHP              0                 0                  .77
aluminum        electrochemical          0                .26                  0
aluminum        feedstocks               0                 0                   0
aluminum        machine drive            0                .26                  0
aluminum        process heat             0                 0                  .77    
aluminum        other uses               0                .515                .515

this also needs to be done using base R or dplyr 0.8.1, tidyr 0.8.3. Thank you!

Data:

> dput(tech_change_supply)
structure(list(supplysector = structure(c(2L, 3L, 4L, 6L, 7L, 
9L, 10L, 11L), .Label = c("agriculture", "aluminum and nonferrous metals", 
"cement energy processes", "chemicals", "construction", "food processing", 
"iron and steel", "mining", "other manufacturing", "other nonmetallic minerals", 
"pulp paper and wood"), class = "factor"), tech_change_all = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), tech_change_elec = c(0.265280686, 0.856866327, 1.316191449, 
0.390516499, 0.02304637, 1.114989892, 0.410722674, 0.956948382
), tech_change_no_elec = c(0.768341707, 0.915245624, NA, 0.6994352, 
0.575833299, 1.040612635, 0.752058291, NA)), row.names = c(NA, 
-8L), class = "data.frame")
> dput(tech_change_service)
structure(list(subsector = structure(c(2L, 3L, 5L, 6L, 7L, 10L, 
9L), .Label = c("agriculture energy use", "boilers", "boilers_CHP", 
"construction energy use", "electrochemical", "feedstocks", "machine drive", 
"mining energy use", "other uses", "process heat"), class = "factor"), 
    tech_change_all = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), tech_change_elec = c(0, 
    0, 1, 0, 1, 0, 0.5), tech_change_no_elec = c(1, 1, 0, 0, 
    0, 1, 0.5)), row.names = c(NA, -7L), class = "data.frame")
Maridee Weber
  • 231
  • 1
  • 8
  • https://stackoverflow.com/questions/23020826/how-do-you-map-every-combination-of-categorical-variables-in-r Have you tried with command aggregate for the combination of the 2 columns? – domiziano Sep 23 '20 at 16:40
  • @domiziano I've used aggregate to create combos within a single dataframe, but not when combining two separate dataframes. – Maridee Weber Sep 23 '20 at 16:49
  • @GregorThomas Thanks! Edited the title. – Maridee Weber Sep 23 '20 at 16:50

1 Answers1

1

We can use a cross-join to put the data together in all combinations. dplyr enables cross joins (all combinations of rows) by using a full_join with by = character().

together = full_join(tech_change_supply, tech_change_service, by = character(), suffix = c(".supply", ".service"))

From there you can use normal mutate operations to do your multiplication/averages as needed. I'll leave that to you - be careful with your NA values. You can end with select(-ends_with(".supply"), - ends_with(".service")) to remove all the non-transformed columns.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294