Welcome to StackOverflow! If you add more specifics I can refine my answer, but here is something to get you started.
library(data.table)
## Load your csv file
#search_in <- fread("path/to/file.csv")
## In lieu of a csv, create a table of example text values to search within
search_in <- data.table(text=c(
"Visit the U.S. Capital and see Congress in action",
"Santa Clause is (a) real (movie)",
"The Marines were founded in 1775",
"What does the fox say?",
"The United States Senate is the upper chamber of the United States Congress"))
## Create a table of your search terms and the corresponding values
search_for <- data.table(
word=c("U.S. Capital", "Biden", "Congress", "Marines", "Senate", "Santa"),
value=c(-0.5, -0.6, -0.4, -0.2, -0.4, -0.03))
search_res <- merge(search_in[, id:=1L], search_for[, id:=1L], by="id", allow.cartesian=TRUE)[,
match:=text %like% word, by=.(text, word, value)][
match==TRUE, .(words=paste(sort(word), collapse=", "), value=sum(value)), by=text]
search_res <- merge(search_in[, -"id"], search_res, on="text", all.x=TRUE)
search_res
## text words value
##1: Visit the U.S. Capital and see Congress in action Congress, U.S. Capital -0.90
##2: Santa Clause is (a) real (movie) Santa -0.03
##3: The Marines were founded in 1775 Marines -0.20
##4: The United States Senate is the upper chamber of the United States Congress Congress, Senate -0.80
##5: What does the fox say? <NA> NA
The first line of code that creates search_res
joins all rows from search_in
and search_for
, adds a column indicating if the search term is matched in the text
column, subsets rows that match, and sums up the values.
The line after that joins the original search_in
to the results, so you can see text lines that do not have a keyword match.
Depending on the size of your data this may be sufficient. If you're using Linux or macOS, you might investigate using grep or a similar bash solution.