1

Perhaps an easy question, but I am a beginner and I could not find answers that apply to my case on stack overflow here.

I merged two datasets, which now look like this (the actual dataset has 5,000+ observations):

> data <- structure(list(Country = c("France", "France", "France", "Germany", 
"Germany", "Germany"), Type_a = c("Type_a", "", "Type_a", "Type_a,Type_b", 
"Type_b,Type_c,Type_f", "Type_f"), Type_b = c("", "Type_b", "Type_b", 
"", "", ""), Type_c = c("", "", "Type_c", "", "", ""), Type_d = c("", 
"Type_d", "", "", "", ""), Type_e = c("Type_e", "", "Type_e", 
"", "", ""), Type_f = c("Type_f", "", "Type_f", "", "", "")), row.names = c(NA, 
6L), class = "data.frame")
> View(data)

Please run View(data). Observations for France are in the right columns, but observations for Germany are all in one single column.

I want to split Germany data and assign them to the right columns, while taking into account missing data. That is, data cannot just be split and assigned to all next columns (as in the stack overflow link above), but only to select columns. Also, I do not want to create new columns.

How do I do this?

Thank you.

johnjohn
  • 716
  • 2
  • 9
  • 17

1 Answers1

1

If this example is representative of your actual data you could possibly do something like:

library(tidyverse)

data %>%
  rowid_to_column() %>%
  separate_rows(Type_a, sep = ",") %>%
  pivot_longer(-c(rowid, Country)) %>%
  mutate(name = if_else(value == "", name, value)) %>%
  pivot_wider(id_cols = c(rowid, Country), values_fn = list(value = ~paste0(.x, collapse = ""))) %>%
  na_if("")

# A tibble: 6 x 8
  rowid Country Type_a Type_b Type_c Type_d Type_e Type_f
  <int> <chr>   <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1     1 France  Type_a NA     NA     NA     Type_e Type_f
2     2 France  NA     Type_b NA     Type_d NA     NA    
3     3 France  Type_a Type_b Type_c NA     Type_e Type_f
4     4 Germany Type_a Type_b NA     NA     NA     NA    
5     5 Germany NA     Type_b Type_c NA     NA     Type_f
6     6 Germany NA     NA     NA     NA     NA     Type_f
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • Many thanks, and yes, this is representative of my data. This is not exactly what I need, though. If you look at the first data frame, you will see that Germany has several answers (see for example observation #5, which has 3 answers), and I need *all* of them to be reassigned to the right columns. Your solution works, but it keeps only one answer. Any way to improve this? – johnjohn Oct 12 '19 at 12:58
  • 1
    @johnjohn - look again, edited it ten minutes ago to address that – Ritchie Sacramento Oct 12 '19 at 12:59
  • 1
    @H 1 many thanks it worked. i upvoted your answer but it's not displayed because i just joined the community! – johnjohn Oct 12 '19 at 13:24