I am currently trying to subset a dataset based on words in a string. Using the stringr package, I am attempting to subset using str_detect as follows:
subdat <- dat %>% filter(str_detect(de, index$Full[1]))
This produces a correct data table of the subset where the first "Full" in index is what is detected. However when that same code is entered into a for loop, replacing the index with "i" to cycle through all names, the subsets are no longer detecting the right string.
for (i in 1){
subdat <- dat %>% filter(str_detect(de, index$Full[i]))
}
On top of this, each iteration detects the same incorrect subset. When testing the "i" variable outside the for loop, the same issue occurs in the str_detect. When the following code is ran, with i equaling 1, R returns TRUE:
index$Name[i] == index$Full[1]
But again different datasets are returned for the following code:
subdati <- dat %>% filter(str_detect(de, index$Full[i]))
subdat1 <- dat %>% filter(str_detect(de, index$Full[1]))
With my index being about 70 entries long, I'd like to be able to complete a for loop to eventually write CSVs for the subsets (this of which is not an issue coding wise). I hope this is sufficient as this is my first time asking and can help clarify anything if need be.
Added dput output for reproducible example:
> dput(droplevels(dat))
structure(list(evt = structure(c(3L, 4L, 1L, 5L, 2L), .Label = c("112",
"150", "22", "41", "320"), class = "factor"), cl = structure(c(2L,
1L, 5L, 4L, 3L), .Label = c("08:49", "10:32", "11:21", "10:31",
"02:28"), class = "factor"), de = c("[BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte)",
"[BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte)", "[OKC] Westbrook Foul: Shooting (1 PF) (1 FTA) (K Scott)",
"[SAS] Paul Foul: Personal (2 PF) (B Forte)", "[DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte)"
), i = c(1, 1, 36, 383, 461)), .Names = c("evt", "cl", "de",
"i"), row.names = c(1L, 4L, 1599L, 16358L, 18269L), class = "data.frame")
> dput(droplevels(index))
structure(list(First = structure(1:2, .Label = c("B", "K"), class = "factor"),
Last = structure(1:2, .Label = c("Forte", "Scott"), class = "factor"),
Full = c("B Forte", "K Scott")), .Names = c("First", "Last",
"Full"), row.names = c(1L, 36L), class = "data.frame")
With this I am given the current outputs:
> subdat <- dat %>% filter(str_detect(de, index$Full[1]))
> subdat
evt cl de i
1 22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2 41 08:49 [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1
3 320 10:31 [SAS] Paul Foul: Personal (2 PF) (B Forte) 383
4 150 11:21 [DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte) 461
> for (i in 1){
+ subdatloop <- dat %>% filter(str_detect(de, index$Full[i]))
+ }
> subdatloop
evt cl de i
1 22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2 41 08:49 [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1
> index$Full[i] == index$Full[1]
[1] TRUE
> subdati <- dat %>% filter(str_detect(de, index$Full[i]))
> subdati
evt cl de i
1 22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2 41 08:49 [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1
> subdat1
evt cl de i
1 22 10:32 [BOS] Tatum Foul: Defense 3 Second (1 PF) (1 FTA) (B Forte) 1
2 41 08:49 [BOS] Hayward Foul: Shooting (1 PF) (2 FTA) (B Forte) 1
3 320 10:31 [SAS] Paul Foul: Personal (2 PF) (B Forte) 383
4 150 11:21 [DAL] Harris Foul: Shooting (2 PF) (1 FTA) (B Forte) 461
EDIT: Added reproducible example and expected output.