Regular expression in R for detecting $45

Question

Following are the text samples I have:

text1 : "The salary is $34-$36" text2 : "The salary is $34.50-$36.20" text3 : "The salary is $45000-$34000" text4 : "The salary is $45-$34K"

So whenever I find patterns like $34-$36 or $34.50-$36.20 I need to add word hour to the text and whenever I find patterns like $45000-$34000 or $45-$34K I need to add word salary to text.

Can someone help me how to solve this in R using regular expressions?

Thank-you.

@chinsoon12 i have given part of text...even in text3 and tex4 there are words like hour, hourly and even i text1 and text2 there are words like salary...so i want to detect these $45 sort of numbers and give extra weightage and not every sample similar to text3 and text4 has per month string — ravi theja, Apr 06 '17 at 01:08
Your text3 and text 4 _already_ say salary. Do you want to add it again? text1 and text2 already say hourly — G5W, Apr 06 '17 at 01:33
@BenBolker Thanks a lot. With minor changes gsub("\\$[0-9]+\\.[[:digit:]]+","\\1 hour",text1) this worked for me. Can you guide me for text3 and text4 samples ? — ravi theja, Apr 06 '17 at 03:16

score 0 · Answer 1 · answered Apr 06 '17 at 08:05

For one case, it might work with a negative lookahead regular expression:

# add 'hour' for 2-digit $-values (with optional decimal fraction)
# but only if NOT followed by 000 or K
gsub("(\\$\\d{1,2}(?:\\.[\\d]+)?(?!000|K))", "\\1 hour", txt, perl=TRUE)

The second case:

# add 'salary' for 4-5-digit $-values (with optional decimal fraction)
# but only if followed by 000 or K
gsub("(\\$\\d{1,2}(000|K))", "\\1 salary", txt, perl=TRUE)

I've tested this with only few snippets. Maybe your test cases are more complex than mine.

thanks a lot..it is working – ravi theja Apr 06 '17 at 11:43 — ravi theja, Apr 06 '17 at 11:43

Regular expression in R for detecting $45

1 Answers1