0

I have a set of results from a survey (had to cut down the actual results):

structure(list(`What is your age?` = c("65+", "65+", "65+", "25-34", 
"45-54", "65+"), `Gender identity` = c("Female", "Female", "Male", 
"Non-Binary", "Female", "Female")), row.names = 3:8, class = "data.frame")

And I want to separate the age range column into a min age and max age column, splitting the two ages where necessary. I am not worried about the 65+ category since max can be blank.

I can't seem to get the syntax correct on the separate call. I have looked at the docs for ages now and I just get different errors whenever I try something. Here are some exmaples:

workingfile$`What is your age?` %>% separate(`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE)
workingfile %>% separate(`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE)
workingfile %>% separate(.$`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE)

the errors in order of each line tried:

enter image description here

vvvvv
  • 25,404
  • 19
  • 49
  • 81
  • 3
    We cannot read data into R from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(workingfile)`, if that is not too large. – neilfws Nov 16 '22 at 02:30
  • 1
    As neilfws hinted, if the output from `dput()` is too big, you can use `dput(head(workingfile))` to provide a subset. – John Polo Nov 16 '22 at 02:50
  • `sep = "[^[:alnum:]]+"` is the default so you could just skip that argument to split on any non alphanumeric. – Dan Adams Nov 16 '22 at 03:23

1 Answers1

1

The default is tidyr::separate(sep = "[^[:alnum:]]+") which will split at any non-alphanumeric character which in your case gives what you want.

library(tidyverse)

d <- structure(list(`What is your age?` = c("65+", "65+", "65+", "25-34", 
                                            "45-54", "65+"), `Gender identity` = c("Female", "Female", "Male", 
                                                                                   "Non-Binary", "Female", "Female")), row.names = 3:8, class = "data.frame")

d %>% 
  separate(`What is your age?`, 
           into = c("min", "max"))
#>   min max Gender identity
#> 3  65              Female
#> 4  65              Female
#> 5  65                Male
#> 6  25  34      Non-Binary
#> 7  45  54          Female
#> 8  65              Female

Created on 2022-11-16 with reprex v2.0.2

Dan Adams
  • 4,971
  • 9
  • 28
  • that works, but it does my head in that this line: workingfile %>% separate(`What is your age?`, c('Min Age', 'Max Age'), "_|(?=...$) ", convert = TRUE) gave an error. I was so close yet so far! – Plot Device Nov 16 '22 at 21:31
  • Check it out in a regex tester [https://regex101.com/](https://regex101.com/). Yours doesn't seem to make much sense to me although I'm not a regex expert by any stretch. – Dan Adams Nov 17 '22 at 00:37