Aggregating data in R based on the contents of various columns

Question

I have some data like so, that I have loaded into R

Data column 1, Data column 2,option1, option2, option 3.
23,            45.6         ,       ,option2, option3
5,             30.6         ,option1,option2,
2,            5             ,       ,, option3
3,            40            ,       ,option2,

I want to expand the data so the the option columns become row entries and the data columns are duplicated for each option so I can then do a pivot table based on the column with the options in it

so, I want to know how to go from the top table of data to the bottom table of data in R

Data column 1, Data column 2,option
23,            45.6         ,option2, 
23,            45.6         ,option3
5,             30.6         ,option1
5,             30.6         ,option2
2,            5             ,option3
3,            40            ,option2

Can you please use `dput` to show the example so that we get the structure of the data correctly. thanks. The column s names are also not clear for options.Yoou can try `df1 %>% separate_rows(optioncol, sep=",\\s*")` — akrun, May 19 '20 at 22:36

David T · Answer 1 · 2020-05-20T14:21:32.140

0

The way you've formatted your data is a problem; I'm modifying it a little.

@library(tidyverse)
# Your data structure is something like this:
df <- tibble(DataCol1 = c(23,5,2,3),
       DataCol2 = c(45.6, 30.6, 5, 40), 
       Option1 = c(NA, TRUE, NA, NA),
       Option2 = c(TRUE, TRUE, NA, TRUE),
       Option3 = c(TRUE, NA, TRUE, NA))
columnames = c("Option1", "Option2", "Option3")

We'll use gather to get the output you want

df %>% 
   # A row id makes it easier to put the data back in the same order it came in.
  rowid_to_column("ID") %>% 
    # Here's the tricky part.
  gather(key = "Options", val = "val", all_of(columnames) ) %>% 
    # Clean up a little.
  filter(!is.na(val)) %>% 
  arrange(ID) %>% 
  select(- ID, -val)

And the result is

# A tibble: 6 x 3
  DataCol1 DataCol2 Options
     <dbl>    <dbl> <chr>  
1       23     45.6 Option2
2       23     45.6 Option3
3        5     30.6 Option1
4        5     30.6 Option2
5        2      5   Option3
6        3     40   Option2

The thoroughly modern way to do this is with pivot_longer. Sadly, that release hasn't been installed on my system. You can read Hadley Wickham's instructions for it at https://r4ds.had.co.nz/tidy-data.html#pivoting

Now that we've handled that. For your next question, Please make a Minimal Reproducible Example. See How to make a great R reproducible example . You'll get more help quicker if you do!

edited May 20 '20 at 14:21

answered May 19 '20 at 22:58

David T

1,993
10
18

that works but the columns i want to use, in real life, have no common prefix, they are headings <- df[0,c(seq(37,52,1))] – Oliver Stieber May 20 '20 at 10:42
I've tried the following based on your suggestion: df%>% pivot_longer(all_of(selectedheadings), names_to = "headings", values_to = "headiungchosen") but I get an error: Error: Must subset columns with a valid subscript vector. x Subscript has the wrong type `data.frame< with some columns reporting data type character and others reporting datatype logical – Oliver Stieber May 20 '20 at 11:01
I get the same error if I substitute starts_with("Opt") with all_of(selectedheadings) – Oliver Stieber May 20 '20 at 11:12
I'm using headings<- df[0,c(seq(37,52,1))] to get the column names and every time I unlist either it of one of it's entries I get chaacter(0) dispite having a string when I print it as a list berfore calling unlist – Oliver Stieber May 20 '20 at 11:40
whenever I try to view thew headings e.g view(headings) I get thge following error which I cannot find reporterd anywher else: Error in view_fun(list( Data In Heading = character(0), : invalid 'x' argument – Oliver Stieber May 20 '20 at 11:51
I've got it sorted now, i had to change the way I was getting the column namews to headings <- colnames(df)[37:52] – Oliver Stieber May 20 '20 at 12:08
if you modify your anser to include columnnames = c("Option1", "Option2", "Option3") at the top and then change the following line gather(key = "Options", val = "val", starts_with("Opt")) %>% to gather(key = "Options", val = "val", all_of(columnnames )) %>% I'll consider jobs a gooden and close – Oliver Stieber May 20 '20 at 13:11
OK, I did that. Looks good? – David T May 20 '20 at 14:29
I've just spotted that it creates a row of data for each named column always and no only if it appears on the row – Oliver Stieber May 20 '20 at 17:02
I can't find a reference to your original startsWith syntax, is it possible to combine startsWith with the list of colunms/valujes columnnames so that only those data entries that startWith one of them are included and all the blank entries get dropped. – Oliver Stieber May 20 '20 at 17:28
I think I'm going to writye a for loop to get past this emapass – Oliver Stieber May 20 '20 at 17:29
It was `starts_with`. See https://r4ds.had.co.nz/transform.html#select . I regret that , after building you a solution that works on the data you posted, you feel that you cannot upvote my work because of some additional requirements. – David T May 20 '20 at 17:36

Aggregating data in R based on the contents of various columns

1 Answers1