17
library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
           class = c("sports", "", "music, sports", ""),
           hobby = c("knitting, music, sports", "", "", "music"))
> dat1
  id         class                   hobby
1  1        sports knitting, music, sports
2  2                                      
3 34 music, sports                        
4 99                                 music

I have the above dataset, dat1, where each row corresponds to a unique id. For each id, multiple inputs for either class or hobby are separated by a comma.

I would like to exchange the row and column of this dataset so that I get the following:

     input class hobby
1   sports 1, 34     1
2 knitting           1
3    music    34 1, 99

In this dataset, each row corresponds to a unique input from dat1. Now the class and hobby columns are storing the corresponding ids from dat1, each separated by a comma.

Is there a quick way to swap the row and columns like this in R?

Adrian
  • 9,229
  • 24
  • 74
  • 132

5 Answers5

9

Here is a data.table solution

Input

library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
                   class = c("sports", "", "music, sports", ""),
                   hobby = c("knitting, music, sports", "", "", "music"))
dat1
#>    id         class                   hobby
#> 1:  1        sports knitting, music, sports
#> 2:  2                                      
#> 3: 34 music, sports                        
#> 4: 99                                 music

Dataprep

# in long format
dt_melted <- melt.data.table(dat1, id.vars = "id", variable.name = "type", value.name = "value")
dt_melted
#>    id  type                   value
#> 1:  1 class                  sports
#> 2:  2 class                        
#> 3: 34 class           music, sports
#> 4: 99 class                        
#> 5:  1 hobby knitting, music, sports
#> 6:  2 hobby                        
#> 7: 34 hobby                        
#> 8: 99 hobby                   music

# split values by comma
dt_splitted <- dt_melted[, .(input = unlist(data.table::tstrsplit(value, ","))), by = .(id, type)]
dt_splitted
#>    id  type    input
#> 1:  1 class   sports
#> 2: 34 class    music
#> 3: 34 class   sports
#> 4:  1 hobby knitting
#> 5:  1 hobby    music
#> 6:  1 hobby   sports
#> 7: 99 hobby    music

Last Step 1

# bring back to desired wide format
dt_casted <- dcast.data.table(dt_splitted, 
                              formula = "input ~ type",
                              value.var = "id",
                              fun.aggregate = paste, 
                              collapse = ", ")
dt_casted
#>       input class hobby
#> 1: knitting           1
#> 2:    music    34 1, 99
#> 3:   sports 1, 34     1

Last Step 2 (more verbose)

# combine ids by class/hobby
dt_splitted[, .(class = paste(id[type == "class"], collapse = ", "),
                hobby = paste(id[type == "hobby"], collapse = ", ")),
            by = .(input = trimws(input))]
#>       input class hobby
#> 1:   sports 1, 34     1
#> 2:    music    34 1, 99
#> 3: knitting           1
mnist
  • 6,571
  • 1
  • 18
  • 41
8

Another data.table option using dcast + melt

dcast(
  melt(dat1[, lapply(.SD, strsplit, ", "), id], "id")[
    ,
    .(input = unlist(value)),
    .(id, variable)
  ], input ~ variable,
  value.var = "id",
  fun = toString
)

which gives

      input class hobby
1: knitting           1
2:    music    34 1, 99
3:   sports 1, 34     1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
5

Here's a quick tidyverse way:

library(dplyr)
library(tidyr)
dat1 %>% 
  pivot_longer(-id, values_to = "input") %>%
  separate_rows(input) %>% 
  filter(input != "") %>% 
  pivot_wider(names_from = "name", values_from = "id", values_fn = toString)
  input    class hobby
1 sports   1, 34 1    
2 knitting NA    1    
3 music    34    1, 99
Maël
  • 45,206
  • 3
  • 29
  • 67
4

Here is an (not that long) tidyverse option. The issue is that values for the future 'input` column are scattered into multiple columns and can contain several value into the same string.

First a pivot_longer to have all the future input into the same column:

dat2 = dat1 %>% pivot_longer(!id)
dat2
# A tibble: 8 x 3
     id name  value                    
  <dbl> <chr> <chr>                    
1     1 class "sports"                 
2     1 hobby "knitting, music, sports"
3     2 class ""                       
4     2 hobby ""                       
5    34 class "music, sports"          
6    34 hobby ""                       
7    99 class ""                       
8    99 hobby "music" 

Then the approach is to use the stringr package to split all the potential input values to then pivoting again to have all combination. Then filtering for the row having an input (nchar(input)>0). Finally grouping by the input and name to paste the multiple id together id = paste0(id, collapse = ","). And a last pivot_wider to have the data frame as desired:

cbind(dat2, str_split(str_remove_all(dat2$value, ","), " ", simplify = T)) %>%
  select(!value) %>% 
  pivot_longer(!c("id", "name"), names_to = "name_2", values_to = "input") %>% 
  select(!name_2) %>% 
  filter(nchar(input)>0) %>%
  group_by(input, name) %>% 
  summarise(id = paste0(id, collapse = ",")) %>% 
  pivot_wider(names_from = name, values_from = id)

# A tibble: 3 x 3
# Groups:   input [3]
  input    hobby class
  <chr>    <chr> <chr>
1 knitting 1     NA   
2 music    1,99  34   
3 sports   1     1,34 
Bushidov
  • 713
  • 4
  • 16
3

Alternative way, using dplyr, listing the values in the rows by calling strsplit then unnesting it, pivoting longer, back into pivot wider, to reverse the values.

library(dplyr)


dat1 |>
  rowwise() |> 
  mutate(across(-id, ~ ifelse(nchar(.x), strsplit(.x, ", "), list("")))) |> 
  unnest(cols = -id) |> 
  pivot_longer(cols = class:hobby) |> 
  filter(value != "") |> 
  pivot_wider(id_cols = value, names_from = name, values_from = id, 
              values_fn = \(x) paste(unique(x), collapse = " ,"))
  value    class hobby
  <chr>    <chr> <chr>
1 sports   1 ,34 1    
2 knitting NA    1    
3 music    34    1 ,99
Just James
  • 1,222
  • 2
  • 7