0

I am currently trying to subset a dataset based on words in a string. Using the stringr package, I am attempting to subset using str_detect as follows:

subdat <- dat %>% filter(str_detect(de, index$Full[1]))

This produces a correct data table of the subset where the first "Full" in index is what is detected. However when that same code is entered into a for loop, replacing the index with "i" to cycle through all names, the subsets are no longer detecting the right string.

for (i in 1){
  subdat <- dat %>% filter(str_detect(de, index$Full[i]))
}

On top of this, each iteration detects the same incorrect subset. When testing the "i" variable outside the for loop, the same issue occurs in the str_detect. When the following code is ran, with i equaling 1, R returns TRUE:

index$Name[i] == index$Full[1]

But again different datasets are returned for the following code:

subdati <- dat %>% filter(str_detect(de, index$Full[i]))
subdat1 <- dat %>% filter(str_detect(de, index$Full[1]))

With my index being about 70 entries long, I'd like to be able to complete a for loop to eventually write CSVs for the subsets (this of which is not an issue coding wise). I hope this is sufficient as this is my first time asking and can help clarify anything if need be.

Added dput output for reproducible example:

> dput(droplevels(dat))
structure(list(evt = structure(c(3L, 4L, 1L, 5L, 2L), .Label = c("112", 
"150", "22", "41", "320"), class = "factor"), cl = structure(c(2L, 
1L, 5L, 4L, 3L), .Label = c("08:49", "10:32", "11:21", "10:31", 
"02:28"), class = "factor"), de = c("[BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)", 
"[BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)", "[OKC] Westbrook Foul: Shooting (1 PF) (1 FTA) (K Scott)", 
"[SAS] Paul Foul: Personal (2 PF) (B Forte)", "[DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte)"
), i = c(1, 1, 36, 383, 461)), .Names = c("evt", "cl", "de", 
"i"), row.names = c(1L, 4L, 1599L, 16358L, 18269L), class = "data.frame")
> dput(droplevels(index))
structure(list(First = structure(1:2, .Label = c("B", "K"), class = "factor"), 
    Last = structure(1:2, .Label = c("Forte", "Scott"), class = "factor"), 
    Full = c("B Forte", "K Scott")), .Names = c("First", "Last", 
"Full"), row.names = c(1L, 36L), class = "data.frame")

With this I am given the current outputs:

> subdat <- dat %>% filter(str_detect(de, index$Full[1]))
> subdat
  evt    cl                                                          de   i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)   1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)   1
3 320 10:31                  [SAS] Paul Foul: Personal (2 PF) (B Forte) 383
4 150 11:21        [DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte) 461

> for (i in 1){
+   subdatloop <- dat %>% filter(str_detect(de, index$Full[i]))
+ }
> subdatloop
  evt    cl                                                          de i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1

> index$Full[i] == index$Full[1]
[1] TRUE
> subdati <- dat %>% filter(str_detect(de, index$Full[i]))
> subdati
  evt    cl                                                          de i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1
> subdat1
  evt    cl                                                          de   i
1  22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)   1
2  41 08:49       [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)   1
3 320 10:31                  [SAS] Paul Foul: Personal (2 PF) (B Forte) 383
4 150 11:21        [DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte) 461

EDIT: Added reproducible example and expected output.

bpbaker
  • 1
  • 1
  • 2
    Your two examples are not the same; one uses `index$Name` and one uses `index$FName`. Furthermore, could you try to create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – David Robinson Jan 17 '18 at 17:30
  • Thanks David, I'll get right on that and hopefully have a better example up soon. – bpbaker Jan 17 '18 at 17:59
  • I have added the example as well as the output I am receiving. Thank you David, I know this is more confusing than necessary but the output was hard to reporduce on a small scale. – bpbaker Jan 17 '18 at 19:32

0 Answers0