-2

I have some data like so, that I have loaded into R

Data column 1, Data column 2,option1, option2, option 3.
23,            45.6         ,       ,option2, option3
5,             30.6         ,option1,option2,
2,            5             ,       ,, option3
3,            40            ,       ,option2, 

I want to expand the data so the the option columns become row entries and the data columns are duplicated for each option so I can then do a pivot table based on the column with the options in it

so, I want to know how to go from the top table of data to the bottom table of data in R

Data column 1, Data column 2,option
23,            45.6         ,option2, 
23,            45.6         ,option3
5,             30.6         ,option1
5,             30.6         ,option2
2,            5             ,option3
3,            40            ,option2
  • 1
    Can you please use `dput` to show the example so that we get the structure of the data correctly. thanks. The column s names are also not clear for options.Yoou can try `df1 %>% separate_rows(optioncol, sep=",\\s*")` – akrun May 19 '20 at 22:36

1 Answers1

0

The way you've formatted your data is a problem; I'm modifying it a little.

@library(tidyverse)
# Your data structure is something like this:
df <- tibble(DataCol1 = c(23,5,2,3),
       DataCol2 = c(45.6, 30.6, 5, 40), 
       Option1 = c(NA, TRUE, NA, NA),
       Option2 = c(TRUE, TRUE, NA, TRUE),
       Option3 = c(TRUE, NA, TRUE, NA))
columnames = c("Option1", "Option2", "Option3")

We'll use gather to get the output you want

df %>% 
   # A row id makes it easier to put the data back in the same order it came in.
  rowid_to_column("ID") %>% 
    # Here's the tricky part.
  gather(key = "Options", val = "val", all_of(columnames) ) %>% 
    # Clean up a little.
  filter(!is.na(val)) %>% 
  arrange(ID) %>% 
  select(- ID, -val)

And the result is

# A tibble: 6 x 3
  DataCol1 DataCol2 Options
     <dbl>    <dbl> <chr>  
1       23     45.6 Option2
2       23     45.6 Option3
3        5     30.6 Option1
4        5     30.6 Option2
5        2      5   Option3
6        3     40   Option2

The thoroughly modern way to do this is with pivot_longer. Sadly, that release hasn't been installed on my system. You can read Hadley Wickham's instructions for it at https://r4ds.had.co.nz/tidy-data.html#pivoting

Now that we've handled that. For your next question, Please make a Minimal Reproducible Example. See How to make a great R reproducible example . You'll get more help quicker if you do!

David T
  • 1,993
  • 10
  • 18
  • that works but the columns i want to use, in real life, have no common prefix, they are headings <- df[0,c(seq(37,52,1))] – Oliver Stieber May 20 '20 at 10:42
  • I've tried the following based on your suggestion: df%>% pivot_longer(all_of(selectedheadings), names_to = "headings", values_to = "headiungchosen") but I get an error: Error: Must subset columns with a valid subscript vector. x Subscript has the wrong type `data.frame< with some columns reporting data type character and others reporting datatype logical – Oliver Stieber May 20 '20 at 11:01
  • I get the same error if I substitute starts_with("Opt") with all_of(selectedheadings) – Oliver Stieber May 20 '20 at 11:12
  • I'm using headings<- df[0,c(seq(37,52,1))] to get the column names and every time I unlist either it of one of it's entries I get chaacter(0) dispite having a string when I print it as a list berfore calling unlist – Oliver Stieber May 20 '20 at 11:40
  • whenever I try to view thew headings e.g view(headings) I get thge following error which I cannot find reporterd anywher else: Error in view_fun(list( Data In Heading = character(0), : invalid 'x' argument – Oliver Stieber May 20 '20 at 11:51
  • I've got it sorted now, i had to change the way I was getting the column namews to headings <- colnames(df)[37:52] – Oliver Stieber May 20 '20 at 12:08
  • if you modify your anser to include columnnames = c("Option1", "Option2", "Option3") at the top and then change the following line gather(key = "Options", val = "val", starts_with("Opt")) %>% to gather(key = "Options", val = "val", all_of(columnnames )) %>% I'll consider jobs a gooden and close – Oliver Stieber May 20 '20 at 13:11
  • OK, I did that. Looks good? – David T May 20 '20 at 14:29
  • I've just spotted that it creates a row of data for each named column always and no only if it appears on the row – Oliver Stieber May 20 '20 at 17:02
  • I can't find a reference to your original startsWith syntax, is it possible to combine startsWith with the list of colunms/valujes columnnames so that only those data entries that startWith one of them are included and all the blank entries get dropped. – Oliver Stieber May 20 '20 at 17:28
  • I think I'm going to writye a for loop to get past this emapass – Oliver Stieber May 20 '20 at 17:29
  • It was `starts_with`. See https://r4ds.had.co.nz/transform.html#select . I regret that , after building you a solution that works on the data you posted, you feel that you cannot upvote my work because of some additional requirements. – David T May 20 '20 at 17:36