2

I have two list of some product names. My problem is "Operating system" is matching with "system", "cooling system",etc. But it has to match only with "Operating","OS". Another example is "Key Board" should be matched with "key" or "KB" but not with "Mother Board" or just "Board".

How to give importance to first word than second word?

I used agrep() in R. It matches "system" and "cooling system" also for first example. How to avoid that matches?

And is there any function/method to match "key board" with "KB" and "operating system" with "OS"?

Thanks in advance.

lawyeR
  • 7,488
  • 5
  • 33
  • 63
Kavipriya
  • 441
  • 4
  • 17
  • what do you want to do to the matches? replace them? delete them? select them?? please include some sample data ofr your product list and provide a reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – grrgrrbla Jun 23 '15 at 09:59
  • Have you looked at the synonyms function in the qdap package? – lawyeR Jun 23 '15 at 09:59
  • @lawyeR yes I looked at the synonyms function. But I don't think that has anything to do with my problem. Because my data might have spelling mistake, might be in short forms, or just half part. There is no need for searching for similar words, I think. – Kavipriya Jun 23 '15 at 11:44
  • @grrgrrbla My matching can be through insertion/deletion/substitution. Just an approximate matching is good enough, for correcting spelling. But for dealing with short forms, I couldn't find a proper function. – Kavipriya Jun 23 '15 at 11:47

1 Answers1

1

I have written a function for this, not the most optimized way to do it but this will do the task. the inputs are vectors not lists, hope this helps

stringMatch<-function(search.string,inputstring,pattern=" "){
stringsplit<-unlist(str_split(search.string,pattern))

firstletter<-c()
for(i in seq(1,length(stringsplit))){firstletter<-paste(firstletter,
substring(stringsplit[i],1,1),sep="")}
search.string.l<-tolower(search.string)
firstletter.l<-tolower(firstletter)

matchstring<-grep(paste("\\b",search.string.l,"\\b","|","\\b",firstletter.l,"\\b"
,sep=""),tolower(inputstring))
return(matchstring)
}

test1<-c('hello p','helbbo','hello test','HP')
search.string<-'HP'
[1] 4
rahul
  • 561
  • 1
  • 5
  • 13
  • I want result as `1 4` , not just 4, in your example. is there any way for that? – Kavipriya Jun 23 '15 at 11:28
  • let me understand correctly, what would be the output if the search.string was 'hello p' in the above example, from my understanding it would be 1 4. so given any input string you want to compare the string for direct match and also use – rahul Jun 23 '15 at 11:37
  • Yeah! this works fine for input 'hello p'. That is very helpful. Also, is thr any way so that the reverse is also possible? ('hp' matches with 'hello p') – Kavipriya Jun 23 '15 at 11:53