0

I am trying to create a new data frame in R using an existing data frame of items bought in transactions as shown below:

dput output for the data:

structure(list(Transaction = c(1L, 2L, 2L, 3L, 3L, 3L), Item = c("Bread", 
"Scandinavian", "Scandinavian", "Hot chocolate", "Jam", "Cookies"
), date_time = c("30/10/2016 09:58", "30/10/2016 10:05", "30/10/2016 10:05", 
"30/10/2016 10:07", "30/10/2016 10:07", "30/10/2016 10:07"), 
    period_day = c("morning", "morning", "morning", "morning", 
    "morning", "morning"), weekday_weekend = c("weekend", "weekend", 
    "weekend", "weekend", "weekend", "weekend"), Year = c("2016", 
    "2016", "2016", "2016", "2016", "2016"), Month = c("October", 
    "October", "October", "October", "October", "October")), row.names = c(NA, 
6L), class = "data.frame")

As you can see in the example, the rows are due to each individual product bought, not the transactions themselves (hence why Transaction 2 is both rows 2 and 3).

I would like to make a new table where the rows are the different transactions (1, 2, 3, etc.) and the different columns are categorical (Bread = 0, 1) so I can perform apriori analysis.

Any idea how I can group the different transactions together and then create these new columns?

  • 2
    Hi elliotjamees, welcome to Stack Overflow. Please do not post pictures of your data. Instead, please [edit] your question with the output of `dput(data)` or `dput(head(data))` if your data is very large. Replace `data` with your data frame name. You can use three backticks (`) for better formatting. See [How to make a great R reproducible example](https://stackoverflow.com/a/5963610/) for more tips. – Ian Campbell Jan 05 '21 at 14:18
  • 1
    This sounds like a typical long-to-wide transformation - I'd suggest looking at the [R-FAQ on transforming data from long to wide](https://stackoverflow.com/a/57013551/903061). There are quite a few methods/packages, but I'd recommend akrun's answer using `pivot_wider`. – Gregor Thomas Jan 05 '21 at 14:28

2 Answers2

0

Try dummy_cols from the fastDummies package. This will turn the item column into 0's and 1's. The second line sums per transaction.

d <- dummy_cols(data[1:2], remove_selected_column=T)
d <- aggregate(d[-1], by=list(Transaction=d$Transaction), FUN=sum)
Syl
  • 1
  • 1
0

Assuming your dataframe is called df you can use tidyr's pivot_wider :

df1 <- tidyr::pivot_wider(df, names_from = Item, values_from = Item, 
                          values_fn = n_distinct, values_fill = 0)

df1

#  Transaction date_time      period_day weekday_weekend Year  Month  Bread Scandinavian `Hot chocolate`   Jam Cookies
#        <int> <chr>          <chr>      <chr>           <chr> <chr>  <int>        <int>           <int> <int>   <int>
#1           1 30/10/2016 09… morning    weekend         2016  Octob…     1            0               0     0       0
#2           2 30/10/2016 10… morning    weekend         2016  Octob…     0            1               0     0       0
#3           3 30/10/2016 10… morning    weekend         2016  Octob…     0            0               1     1       1

Or with data.table's dcast :

library(data.table)
dcast(setDT(df), Transaction+date_time+period_day + weekday_weekend + 
      Year + Month ~ Item, value.var = 'Item', fun.aggregate = uniqueN)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213