1

I am not sure how to solve this problem. My dataframe looks like this (but a lot bigger):

df <- data.frame(word = c('word1','word2', 'word3', 'word4', 'word5', 'word6', 'word7'), code = c(1 , 2, 2, 2, 1, 1, 2), modality = c('cog', 'emo', 'soc', 'cog_emo', 'soc', 'soc_emo_cog', 'emo'))
df
   word code    modality
1 word1    1         cog
2 word2    2         emo
3 word3    2         soc
4 word4    2     cog_emo
5 word5    1         soc
6 word6    1 soc_emo_cog
7 word7    2         emo

The modality column shows which modality the word is assigned to. But I need to count the number of words assigned to one modality. If one word is assigned to multiple modalities it has to be counted for each of them. Therefore I would want to duplicate the whole row in which multiple modalities are assigned so that I have one modality per row. Somewhat like this:

    word code modality
1  word1    1      cog
2  word2    2      emo
3  word3    2      soc
4  word4    2      cog
5  word4    2      emo
6  word5    1      soc
7  word6    1      soc
8  word6    1      emo
9  word6    1      cog
10 word7    2      emo

As I said the data frame is a lot bigger, so I cannot do it manually. Thank you!!

Linda Espey
  • 145
  • 5

3 Answers3

3
library(tidyverse)

df %>% mutate(modality = str_split(modality, "_")) %>% unnest(modality)

   word   code modality
   <fct> <dbl> <chr>   
 1 word1     1 cog     
 2 word2     2 emo     
 3 word3     2 soc     
 4 word4     2 cog     
 5 word4     2 emo     
 6 word5     1 soc     
 7 word6     1 soc     
 8 word6     1 emo     
 9 word6     1 cog     
10 word7     2 emo  
det
  • 5,013
  • 1
  • 8
  • 16
1

tidyr::separate_rows() is meant exactly for this scenario


library(tidyr)

df %>% separate_rows(modality, sep = '_')

#> # A tibble: 10 x 3
#>    word   code modality
#>    <chr> <dbl> <chr>   
#>  1 word1     1 cog     
#>  2 word2     2 emo     
#>  3 word3     2 soc     
#>  4 word4     2 cog     
#>  5 word4     2 emo     
#>  6 word5     1 soc     
#>  7 word6     1 soc     
#>  8 word6     1 emo     
#>  9 word6     1 cog     
#> 10 word7     2 emo

Created on 2021-06-11 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
1

Here is a data.table option

> setDT(df)[,.(modality = unlist(strsplit(modality,"_"))),.(word,code)]
     word code modality
 1: word1    1      cog
 2: word2    2      emo
 3: word3    2      soc
 4: word4    2      cog
 5: word4    2      emo
 6: word5    1      soc
 7: word6    1      soc
 8: word6    1      emo
 9: word6    1      cog
10: word7    2      emo
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81