I was hoping to construct a regular expression pattern based on the input like c("dont", "bias*")
so that it could capture sentences that contain both words in order, and the two words shouldn't be more than 4 words apart. For example, it should capture like "I dont think hes biased"
, but it should not capture "I dont know if he has any bias"
, as the latter has 5 words between these two keywords.
I thought this pattern would work: \\bdont\\b.*(?:\\s+\\w+\\s+){0,4}?\\bbias\\w*\\b
, but it returns TRUE
for both sentences. Could anyone help me figure out what went wrong?