0

This is a seemingly simple question that I can't find an answer to.

I have a dataframe

df <- data.frame(respondent = factor(c(1, 2, 3, 4, 5, 6)), language = factor(c("English", "English", "French", "French, German", "German", "German")))

The factor level names reflect survey responses. Most respondents are monolingual, but some speak both French and German. I would like to split the "French, German" level into two.

How can this be done?

KaC
  • 287
  • 1
  • 5
  • 19
  • [see here](https://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows) – pieca May 23 '18 at 19:32
  • You say "data frame", but your code just shows a vector. It is much easier to do this on a vector than on a data frame, where maybe there are other columns you need to bring along as you duplicate rows? Is your example enough, or do you really have a data frame, and if so are there other columns that need to stay matched with the result? If you have a data frame with more columns, please update your example to match (one extra column would be fine for illustration). – Gregor Thomas May 23 '18 at 19:32
  • @pieca: A very different question. The linked solutions don't help to split a factor level. – KaC May 23 '18 at 20:17
  • @Gregor: You're right. Fixed. – KaC May 23 '18 at 20:18
  • By _"split the "French, German" level into two"_, do you mean separate them into two rows one for each language? So respondent `4` would have a row for `French` and another for `German`, correct? – acylam May 23 '18 at 20:50
  • @useR: I suppose it's necessary. It's not ideal, as it could mess up the original dataframe, but I guess it's not a problem if a new dataframe (or a separate vector) is created for this purpose. – KaC May 23 '18 at 20:52
  • Well, without separating into rows, I'm not sure how you can represent two languages from the same respondent. You can have a `language1` and `language2` column, but that seems even less ideal. – acylam May 23 '18 at 20:58
  • I know. The problem is that, if there are more columns in the df, then data get duplicated (not an issue with a two-column df). I'd just create a new df for this and do the rest of the analysis in the original df. Other than that, your solution works. Thank you. – KaC May 23 '18 at 21:01

1 Answers1

2

We can use separate_rows from tidyr then use mutate to convert language back to a factor. The resulting language column would be a factor with a level for each individual language:

library(dplyr)
library(tidyr)

df = df %>%
  separate_rows(language) %>%
  mutate(language = factor(language))

Result:

  respondent language
1          1  English
2          2  English
3          3   French
4          4   French
5          4   German
6          5   German
7          6   German

> df$language
[1] English English French  French  German  German  German 
Levels: English French German
acylam
  • 18,231
  • 5
  • 36
  • 45