This can be achieved using the dplyr
and str_detect()
from the stringr
package. Note that "India | india" in the following code will capture both "India" and the grammatically incorrect "india" in case it exists:
library(dplyr)
library(stringr)
# Some example data
df <- data.frame(File = c(1356, 1548, 1600, 1601),
Text = c("Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i",
"The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti",
"Some other text",
"This string has india without a capital I."))
df <- df %>%
filter(str_detect(Text, "India | india"))
df
# File Text
# 1 1356 Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i
# 2 1548 The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti
# 3 1601 This string has india without a capital I.