1

I have been using the function separate() from the library(tidyverse) to separate values into different columns:

45 (10, 89) 
34

and with the code:

dd %>% separate(a, c("x","y","z"), extra="drop") 

I got what I wanted:

45 10 89
34

But now, my variable has a different format and is not working:

45% (10,89)
34%

Why is not working when using the symbol '%'?

******Edited: Ok, I know why is not working, it is because decimal symbol in my data:

4.5% (10/89)
3.4%

6.7%

7.8% (89/98)

How do you deal with decimals with the separate function? Thank you very much!!


Thank you!

  • It works for me. – Psidom Oct 26 '17 at 17:11
  • What does "not working" mean exactly for you? – MrFlick Oct 26 '17 at 17:56
  • Instead of getting 45 10 89, I get 7 7 6, I think it is considerering as a factor and the numbers correspond to those factors. –  Oct 26 '17 at 19:06
  • Mee, the output `7 7 6` makes no sense, even in light of `stringsAsFactors=TRUE` (which does not change it). If my answer below doesn't meet your needs, I think you need to make this question a little more [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by adding to your question the output of `dput(head(dd))` (or specific rows that include the behavior you just described). – r2evans Oct 26 '17 at 19:14
  • Yes, I know the output 7 7 6 makes no sense. I think is because the data in some rows of the variable there is not data at all, foe example: "75% (10/11)", "", "45%", "" –  Oct 26 '17 at 19:40
  • Mee, I've tested your new string samples and they seem to work just fine. Is there something missing from my small sample data below? – r2evans Oct 26 '17 at 19:58
  • Yes, there is one thing missing I didn't take into account, there is decimal symbols, I think this is why is not working. I have edited the question Thank you! –  Oct 27 '17 at 07:23

1 Answers1

3

I'm inferring that when you say "is not working", it's because the percent sign is being removed:

separate(data_frame(a=c("45 (10, 89)","34")), a, c('x','y','z'), extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1    45    10    89
# 2    34  <NA>  <NA>
separate(data_frame(a=c("45% (10, 89)","34%")), a, c('x','y','z'), extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1    45    10    89
# 2    34        <NA>

From ?separate:

separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
  convert = FALSE, extra = "warn", fill = "warn", ...)
...

Since you are not overriding the default of sep, it finds anything that is not a letter or a number. FYI, [^[:alnum:]]+ is analogous to [^A-Za-z0-9]+, which matches "1 or more characters that are not in the character-ranges of A-Z, a-z, or 0-9".

Simply provide a more-detailed sep, and you'll get what you want.

separate(data_frame(a=c("45% (10, 89)","34%")), a, c('x','y','z'), sep="[^[:alnum:]%]+", extra="drop")
# Warning: Too few values at 1 locations: 2
# # A tibble: 2 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1   45%    10    89
# 2   34%  <NA>  <NA>

Edit: using your most recent sample data:

separate(data_frame(a=c("45% (10/89)","34%","","67%","78% (89/98)")), a, c('x','y','z'), sep="[^[:alnum:]%]+", extra="drop")
# Warning: Too few values at 3 locations: 2, 3, 4
# # A tibble: 5 × 3
#       x     y     z
# * <chr> <chr> <chr>
# 1   45%    10    89
# 2   34%  <NA>  <NA>
# 3        <NA>  <NA>
# 4   67%  <NA>  <NA>
# 5   78%    89    98
r2evans
  • 141,215
  • 6
  • 77
  • 149