Hi friends I have asked a related question here.Problem here is txt
(keywords) with punctuation's are not detected.I tried to make the answer generic but have failed.
Basically i have a txt
(keywords) with punctuation and without punctuation which i need to search in a file toSearch
.
For Ex these are the contents of my file toSearch
[1]'Nokia. Okay. R: Samsung R: Samsung M: And you have? R: I have Micromax'
[2]'M: Okay, you have taken car. R: I have (Mahindra Scorpio and Mahindra's) this Duro DZ.M: Okay.'
[3]'M: What is your age ? R: 32 years R: My name is "Nitish". I have Interior designing business.'
[4]'R: 3rd, Not extra spicy. R: 4th, Fresh. R: 5th, Variety. R: 6th, Hygienic environment'
[5]'How you feel? How it should be? We will move forward, if there we have to make an ideal'
[6]'What is the strength of your organisation? How many people a re working.'
[7]'R: Read newspaper R:Had breakfast with family.'
and the txt
(keywords) are. I have used #@
to separate keywords since i cannot use ,
(comma).
txt<-"R: Samsung R: Samsung M:#@I have (Mahindra Scorpio and Mahindra's)#@R: 32 years R: My name is "Nitish"#@R: 4th, Fresh. R: 5th, Variety#@How you feel? How it should be?
my expected o/p is finding the occurrence and replacing spaces within the keywords with underscore _
[1]'Nokia. Okay. R:_Samsung_R:_Samsung_M: And you have? R: I have Micromax'
[2]'M: Okay, you have taken car. R: I_have_(Mahindra_Scorpio_and_Mahindra's) this Duro DZ.M: Okay.'
[3]'M: What is your age ? R:_32_years_R:_My_name_is_"Nitish". I have Interior designing business.'
[4]'R: 3rd, Not extra spicy. R:_4th,_Fresh._R:_5th,_Variety. R: 6th, Hygienic environment'
[5]'How_you_feel?_How_it_should_ be? We will move forward, if there we have to make an ideal'
[6]'What is the strength of your organisation? How many people a re working.'
[7]'R: Read newspaper R:Had breakfast with family.'
If u guys don't understand it is simple Find And Replace Text(FART) functionality.only spaces are replaced by _
I have tried to use this regular expression
for(i in 1:length(txt))
{
#finding the first word of the keyword
start <- head(strsplit(txt, split=" ")[[i]], 1)
n <- stri_stats_latex(txt[i])[4]
#all possible occurrences for the keywords in the text
o<-unlist(regmatches(toSearch,gregexpr(paste0(start,"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,",n-1,"}"),toSearch,ignore.case=TRUE)))
#exact match with the result
p<-which(!is.na(pmatch(txt,o)))
#replace the keywords in the text file.
text<-as.character(replace_all(text,txt[p],str_replace_all(txt[p])))
}