-1

Hi friends I am trying to search particular keywords (given in txt) in a list of files.I am using a regular expression to detect and replace the occurrence of the keyword in a file. Below is a comma separated keywords that i am passing to be searched.

library(stringi)
txt <- "automatically got activated,may be we download,network services,food quality is excellent"

Ex "automatically got activated" should be searched and replaced by automatically_got_activated..."may be we download" replaced by "may_be_we_download" and so on.

txt <- "automatically got activated,may be we download,network services,food quality is excellent"

for(i in 1:length(txt)) {
    start <- head(strsplit(txt, split=" ")[[i]], 1) #finding the first word of the keyword 
    n <- stri_stats_latex(txt[i])[4]        #number of words in the keyword

    o <- tolower(regmatches(text, regexpr(paste0(start,"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,",
        n-1,"}"),text,ignore.case=TRUE)))   #best match for keyword for the regex in the file 

    p <- which(!is.na(pmatch(txt, o)))      #exact match for the keywords
}
bartektartanus
  • 15,284
  • 6
  • 74
  • 102
OnkarK
  • 41
  • 9
  • 1
    This question probably needs to be cleaned up a bit. Your title and description of the problem differ. Also this is too big; too much information to replicate the problem. Try cutting the data down a bit and make a way (maybe `readLines`) for people to easily read into R.. – Tyler Rinker May 21 '14 at 12:03
  • 1
    10 questions like this and SO data base will be down.... – agstudy May 21 '14 at 12:03
  • 1
    Please consider this to reduce the size of your question: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – mvw May 21 '14 at 12:03
  • sorry guys i am new to SO..:(..thank you for sharing these links – OnkarK May 21 '14 at 12:25
  • ok from the edited code can i get a solution for regular expression that can accept upto 5 words after the fisrt word in the keyword. – OnkarK May 21 '14 at 13:02
  • 1
    @OnkarK I think you'll need to be a little more specific. I think you're trying to show us what you want with the code your wrote that doesn't work as you expect. It's difficult to understand a problem by even well written code. I'd suggest you really define what you're after (maybe even a list of rules). Then actually show us an example output of what you'd expect the code to return. Here's an example where I couldn't describe the problem just with words so I give the desired output too: http://stackoverflow.com/questions/22235288/strsplit-on-all-spaces-and-punctuation-except-apostrophes – Tyler Rinker May 21 '14 at 14:38
  • @TylerRinker, I like it how you marked as duplicate your own question, where the original question was also yours! – David Arenburg May 22 '14 at 08:15
  • I don't understand the keywords. Is the first keyword *automatically*? Or is it *automatically got activated*? – Rich Scriven May 22 '14 at 15:36
  • @RichardScriven The keyword is "automatically got activated" and the end result i want is "automatically_got_activated"..the number of words in the keywords will keep changing.pls ask if you have any more questions. – OnkarK May 23 '14 at 06:39
  • @OnkarK, thanks. I posted an answer. Hopefully it works. – Rich Scriven May 23 '14 at 07:52

1 Answers1

1

I think this may be what you're looking for.

> txt <- "automatically got activated,may be we download,network services,food quality is excellent"

A made-up vector of sentences to search from:

> searchList <- c('This is a sentence that automatically got activated',
                  'may be we download some music tonight',
                  'I work in network services',
                  'food quality is excellent every time I go',
                  'New service entrance',
                  'full quantity is excellent')

A function to do the work:

replace.keyword <- function(text, toSearch)
{
    kw <- unlist(strsplit(txt, ','))
    gs <- gsub('\\s', '_', kw)
    sapply(seq(kw), function(i){
      ul <- ifelse(grepl(kw[i], toSearch),
                   gsub(kw[i], gs[i], toSearch),
                   "")
      ul[nzchar(ul)]
    })
}

The results:

> replace.keyword(txt, searchList)
# [1] "This is a sentence that automatically_got_activated"
# [2] "may_be_we_download some music tonight"              
# [3] "I work in network_services"                         
# [4] "food_quality_is_excellent every time I go"   

Let me know if it works for you.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • thanks a ton.This is working exactly what i wanted.It is simple find and replace text (FART) functionality.My Bad.. – OnkarK May 23 '14 at 10:13
  • Just a small doubt the expected keywords are replaced in (txt) instead of the (searchList).Second doubt the sentences which do not satisfy the keywords should remain as it is in the (searchList). – OnkarK May 23 '14 at 10:32