0

I have a dataframe that contains:

userID song   sex
1      songA  M
2      songB  F
1      songC  M
2      songA  F 
...    ...    ...

So each line is a register of a song listened by the user. I want to use "arules" but first I need to transform this dataframe to a transaction. I've searched a lot but actually I'don't know if my idea is wrong because I have no answer yet. I've find solutions like using split to create lists of lists with all songs listend by each user, but if I do that I'll lose the sex information. I'll only get rules like {songA,songB} -> {songZ}. I want to generate rules like {songA,songC,M} -> {songZ} (using the sex information). I don't know if I am wrong with my idea and this is not possible. Any idea?

Thanks.

hardsoft
  • 7
  • 2
  • Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample data, e.g. with `dput()`, and desired output – mnist Nov 19 '19 at 19:16

1 Answers1

0

If you're looking at associations, you'll generally want to reshape your data into a long dataframe, with an ID column, and another column for your binary item attributes.

There are many ways to reshape your data to get the right form. In your example, I reshaped using tidyverse, and also added a distinct so that the user's gender wouldn't be stated multiple times.

txt = "
userID song   sex
1      songA  M
2      songB  F
1      songC  M
2      songA  F "
df <- read.table(text = txt, header = TRUE)

library(tidyverse)
df %>%
  pivot_longer(cols = c(song, sex)) %>%
  distinct()
#> # A tibble: 6 x 3
#>   userID name  value
#>    <int> <chr> <fct>
#> 1      1 song  songA
#> 2      1 sex   M    
#> 3      2 song  songB
#> 4      2 sex   F    
#> 5      1 song  songC
#> 6      2 song  songA
ravic_
  • 1,731
  • 9
  • 13
  • Thanks! I thought I had to transform the data to a binary matrix with all possible factor values, but this transformation works. – hardsoft Nov 19 '19 at 21:04