1

I have long format data in R for recipes, where each ingredient is listed by row:

house     rec          ingr
 98 leche de soya Leche de soya
 98 leche de soya           pan
 98         arroz         arroz 
 98         arroz        aceite
 98         arroz          agua
 98         arroz       cebolla
 98         pollo         pollo
 98         pollo           sal
 98         pollo        aceite
 98  papas fritas          papa
 98  papas fritas        aceite
 98  carne de res  carne de res
 98  carne de res        aceite
124       Platano       platano
124       Platano         queso
124          cafŽ          cafŽ
124          cafŽ        azucar
124          CafŽ          agua
124      majadito         arroz
124      majadito         carne

I want the ingredients for each unique recipe name and house id. Ideally I would like a data frame 962 columns wide (number of ingredients) with a 1 or zero if the ingredient was present or not or a transaction table, but I'm stuck.

R Curti
  • 31
  • 5

2 Answers2

2

Here is a way to do it with spread():

library(tidyverse)

dat <- data_frame(
  house.id = c(rep(98, 3), rep(124, 3)),
  recipe = c(
    'leche de soya',
    'leche de soya',
    'arroz',
    'arroz',
    'arroz',
    'arroz'
  ),
  ingredients = c(
    'Leche de soya',
    'pan',
    'arroz',
    'aceite',
    'agua',
    'cebolla'
  ),
  values = rep(1, 6)
)

dat <- spread(dat, ingredients, values, fill = 0)
johnson-shuffle
  • 1,023
  • 5
  • 11
  • I've updated the post to show that there are many different house ids (thousands). How can I work the above solution to make each row the recipe ingredients for each unique combination of house id and recipe. I'm guessing it involves dplyr. I've tried, but I can't get it.. – R Curti Apr 23 '17 at 02:08
  • @RCurti I edited the example so there are two house id's in the example. The result is a data frame with 3 rows corresponding to house/recipe combination. Columns correspond to the ingredients. Now both 98 and 124 end up with a recipe for arroz. Isn't that what you are after? – johnson-shuffle Apr 23 '17 at 02:47
0

Using dcast():

library(data.table)
dcast(setDT(DT), house + rec ~ ingr, length)
   house           rec Leche de soya aceite agua arroz azucar caf<U+008E> carne carne de res
1:    98         arroz             0      1    1     1      0           0     0            0
2:    98  carne de res             0      1    0     0      0           0     0            1
3:    98 leche de soya             1      0    0     0      0           0     0            0
4:    98  papas fritas             0      1    0     0      0           0     0            0
5:    98         pollo             0      1    0     0      0           0     0            0
6:   124   Caf<U+008E>             0      0    1     0      0           0     0            0
7:   124       Platano             0      0    0     0      0           0     0            0
8:   124   caf<U+008E>             0      0    0     0      1           1     0            0
9:   124      majadito             0      0    0     1      0           0     1            0
   cebolla pan papa platano pollo queso sal
1:       1   0    0       0     0     0   0
2:       0   0    0       0     0     0   0
3:       0   1    0       0     0     0   0
4:       0   0    1       0     0     0   0
5:       0   0    0       0     1     0   1
6:       0   0    0       0     0     0   0
7:       0   0    0       1     0     1   0
8:       0   0    0       0     0     0   0
9:       0   0    0       0     0     0   0

Note the effect of the typo.

Data

DT <- readr::read_fwf(
 "  hs           rec          ingr
  98 leche de soya Leche de soya
  98 leche de soya           pan
  98         arroz         arroz 
  98         arroz        aceite
  98         arroz          agua
  98         arroz       cebolla
  98         pollo         pollo
  98         pollo           sal
  98         pollo        aceite
  98  papas fritas          papa
  98  papas fritas        aceite
  98  carne de res  carne de res
  98  carne de res        aceite
  124       Platano       platano
  124       Platano         queso
  124          cafŽ          cafŽ
  124          cafŽ        azucar
  124          CafŽ          agua
  124      majadito         arroz
  124      majadito         carne",
 skip = 1L,
 col_positions = readr::fwf_positions(c(3, 6, 20), c(5, 19, NA), c("house", "rec", "ingr")))
Uwe
  • 41,420
  • 11
  • 90
  • 134