My actual case is a list of combined header strings and corresponding data as sub-lists; I wish to subset the list to return a list of sub-lists , i.e the same structure, that only contain the sub-lists whose header strings contain strings that match the strings in a character vector.
Test Data:
lets <- letters
x <- c(1,4,8,11,13,14,18,22,24)
ls <- list()
for (i in 1:9) {
ls[[i]] <- list(hdr = paste(lets[x[i]:(x[i]+2)], collapse=""),
data = seq(1,rnd[i]))
}
filt <- c("bc", "lm", "rs", "xy")
To produce a result list, as returned by:
logical_match <- c(T, F, F, T, F, F, T, F, T)
ls_result <- ls[logical_match]
So the function I seek is: ls_result <- fn(ls, filt)
I've looked at: subset list by dataframe; partial match with %in%; nested sublist by condition; subset list by logical condition; and, my favorite, extract sublist elements to array - this uses some neat purr and dplyr solutions, but unfortunately these aren't viable, as I'm looking for a base R solution to make deployment more straightforward (I'd welcome extended R solutions, for interest, of course).
I'm guessing some variation of logical_match <- lapply(ls, fn, '$hdr', filt) is where I'm heading; I started with pmatch(), and wondered how to incorporate grep, but I'm struggling to see how to generate the logical_match vector.
Can someone set me on the right track, please?
EDIT: when agrepl() is applied to the real data, this becomes trickier; the header string, hdr, may be typically 255 characters long, whilst a string element of the filter vector , filt is of the order of 16 characters. The default agrepl() max.distance argument of 0.1 needs adjusted to somewhere between 0.94 and 0.96 for the example below, which is pretty tight. Even if I use the lower end of this range, and apply it to the ~360 list elements, the function returns a load of total non-matches.
> hdr <- "#CCHANNELSDI12-PTx|*|CCHANNELNO2|*|CDASA1570|*|CDASANAMEShenachieBU_1570|*|CTAGSODATSID|*|CTAGKEYWISKI_LIVE,ShenachieBU_1570,SDI12-PTx,Highres|*|LAYOUT(timestamp,value)|*|RINVAL-777|*|RSTATEW6|*|RTIMELVLhigh-resolution|*|TZEtc/GMT|*|ZDATE20210110130805|*|"
> filt <- c("ShenachieBU_1570", "Pitlochry_4056")
> agrepl(hdr, filt, max.distance = 0.94)
[1] TRUE FALSE