How to get separate() to work with odd column names from a survey

Question

I have a set of results from a survey (had to cut down the actual results):

structure(list(`What is your age?` = c("65+", "65+", "65+", "25-34", 
"45-54", "65+"), `Gender identity` = c("Female", "Female", "Male", 
"Non-Binary", "Female", "Female")), row.names = 3:8, class = "data.frame")

And I want to separate the age range column into a min age and max age column, splitting the two ages where necessary. I am not worried about the 65+ category since max can be blank.

I can't seem to get the syntax correct on the separate call. I have looked at the docs for ages now and I just get different errors whenever I try something. Here are some exmaples:

workingfile$`What is your age?` %>% separate(`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE)
workingfile %>% separate(`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE)
workingfile %>% separate(.$`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE)

the errors in order of each line tried:

We cannot read data into R from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(workingfile)`, if that is not too large. — neilfws, Nov 16 '22 at 02:30
As neilfws hinted, if the output from `dput()` is too big, you can use `dput(head(workingfile))` to provide a subset. — John Polo, Nov 16 '22 at 02:50
`sep = "[^[:alnum:]]+"` is the default so you could just skip that argument to split on any non alphanumeric. — Dan Adams, Nov 16 '22 at 03:23

score 1 · Accepted Answer · answered Nov 16 '22 at 19:21

The default is tidyr::separate(sep = "[^[:alnum:]]+") which will split at any non-alphanumeric character which in your case gives what you want.

library(tidyverse)

d <- structure(list(`What is your age?` = c("65+", "65+", "65+", "25-34", 
                                            "45-54", "65+"), `Gender identity` = c("Female", "Female", "Male", 
                                                                                   "Non-Binary", "Female", "Female")), row.names = 3:8, class = "data.frame")

d %>% 
  separate(`What is your age?`, 
           into = c("min", "max"))
#>   min max Gender identity
#> 3  65              Female
#> 4  65              Female
#> 5  65                Male
#> 6  25  34      Non-Binary
#> 7  45  54          Female
#> 8  65              Female

^{Created on 2022-11-16 with reprex v2.0.2}

that works, but it does my head in that this line: workingfile %>% separate(`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE) gave an error. I was so close yet so far! — Plot Device, Nov 16 '22 at 21:31
Check it out in a regex tester [https://regex101.com/](https://regex101.com/). Yours doesn't seem to make much sense to me although I'm not a regex expert by any stretch. — Dan Adams, Nov 17 '22 at 00:37

How to get separate() to work with odd column names from a survey

1 Answers1