1

I'm using the grepl function to try and sort through data; all the row numbers are different survey respondents, and each number in the "ANI_type" string represents a different type of animal - I need to sort these depending on animal type. For example, the "2"s under ANI_type represent cats. I thought I had it figured out with the following, but it's not only including the "2", but any digit that contains a "2" as well. How can I get this to work so that it ONLY includes "2"? Thanks so much, I'm incredibly new at this!

> animals$cats <- as.numeric(grepl("2", animals$ANI_type))
> animals
                                                    ANI_type dogs cats repamp
1                              1,2,5,12,13,14,15,16,18,19,27    1    1   TRUE
2                                                          2    0    1  FALSE
3                                             20,21,22,23,26    1    1   TRUE
4                                                20,21,22,23    1    1   TRUE
5                                                         13    1    0   TRUE
6                                                          2    0    1  FALSE
7                                                   20,21,22    1    1   TRUE
8                                                20,21,22,23    1    1   TRUE
9                                                   20,21,22    1    1   TRUE
10                                             5,20,21,22,27    1    1   TRUE
11                                              1,2,20,21,22    1    1   TRUE
12                                       5,18,20,21,22,23,26    1    1   TRUE
13                                                     20,21    1    1   TRUE
14                                                        21    1    1   TRUE
15                                                     20,21    1    1   TRUE
16                                                  20,21,26    1    1   TRUE
17                                                         2    0    1  FALSE
18                                                       1,2    1    1   TRUE
19                                                         2    0    1  FALSE
20                                                       3,4    0    0  FALSE

Furthermore, I need to group some of the digits in the strings into categories. For example, digits 6,7,8,9,10,11 all need to be placed in the animals$pock object. How would I go about that using the grep function? Just use alot of the boundary tokens?

1 Answers1

1

You can use the boundary token (\\b):

grepl("\\b2\\b", animals$ANI_type)

But intead of relying on regex you may want to structure the data so that each animal is on its own row. You can use tidyr::separate_rows() for this:

library(tibble)
library(tidyr)

animals %>%
  rowid_to_column(var = "id") %>%
  separate_rows(ANI_type, sep = ",", convert = TRUE) 
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • Furthermore, I need to group some of the digits in the strings into categories. For example, digits 6,7,8,9,10,11 all need to be placed in the animals$pock object. How would I go about that using the grep function? Just use alot of the boundary tokens? – Stephanie Belland Feb 27 '20 at 01:55
  • I think you'd be better off separating the string and matching on exact values as in my example above. – Ritchie Sacramento Feb 27 '20 at 02:22
  • Or using `strsplit()` you could do something like `animals$pock <- sapply(strsplit(df$ANI_type, ","), function(x) any(c(6,7,8,9,10,11) %in% as.numeric(x)))`. As your edit has changed the focus of your question, it would be better to post a new one if you need further help. – Ritchie Sacramento Feb 27 '20 at 02:33
  • for the tidyr - the rows need to be kept as is because they're all answers from different respondents, so the animal types need to be sorted by column. If I used the strsplit function would I need to apply boundaries on all of those? I'm incredibly new at using R - my interest lies in stats... I did not perform well in computer science :/. I will post again after the 90 minute window. Thanks for your help. – Stephanie Belland Feb 27 '20 at 02:46
  • No, the function above splits the string and converts the values to numbers so you can do direct comparisons with other numeric vectors which is simpler than relying on pattern matching character vectors. – Ritchie Sacramento Feb 27 '20 at 02:57
  • I got this error message when I tried: Error in df$ANI_type : object of type 'closure' is not subsettable I did make up a new post – Stephanie Belland Feb 27 '20 at 18:43