I need to extract the text from between parentheses if a keyword is inside the parentheses.
So if I have a string that looks like this:
('one', 'CARDINAL'), ('Castro', 'PERSON'), ('Latin America', 'LOC'), ('Somoza', 'PERSON')
And my keyword is "LOC", I just want to extract ('Latin America', 'LOC')
, not the others.
Help is appreciated!!
This is a sample of my data set, a csv file:
,speech_id,sentence,date,speaker,file,parsed_text,named_entities
0,950094636,Let me state that the one sure way we can make it easy for Castro to continue to gain converts in Latin America is if we continue to support regimes of the ilk of the Somoza family,19770623,Mr. OBEY,06231977.txt,Let me state that the one sure way we can make it easy for Castro to continue to gain converts in Latin America is if we continue to support regimes of the ilk of the Somoza family,"[('one', 'CARDINAL'), ('Castro', 'PERSON'), ('Latin America', 'LOC'), ('Somoza', 'PERSON')]"
1,950094636,That is how we encourage the growth of communism,19770623,Mr. OBEY,06231977.txt,That is how we encourage the growth of communism,[]
2,950094636,That is how we discourage the growth of democracy in Latin America,19770623,Mr. OBEY,06231977.txt,That is how we discourage the growth of democracy in Latin America,"[('Latin America', 'LOC')]"
3,950094636,Mr Chairman,19770623,Mr. OBEY,06231977.txt,Mr Chairman,[]
4,950094636,given the speeches I have made lately about the press,19770623,Mr. OBEY,06231977.txt,given the speeches I have made lately about the press,[]
5,950094636,I am not one,19770623,Mr. OBEY,06231977.txt,I am not one,[]
6,950094636,I suppose,19770623,Mr. OBEY,06231977.txt,I suppose,[]
I am trying to extract just parentheses with the word LOC:
regex <- "(?=\\().*? \'LOC.*?(?<=\\))"
filtered_df$clean_NE <- str_extract_all(filtered_df$named_entities, regex)
The above regular expression does not work. Thanks!