-2

Following are the text samples I have:

text1 : "The salary is $34-$36" text2 : "The salary is $34.50-$36.20" text3 : "The salary is $45000-$34000" text4 : "The salary is $45-$34K"

So whenever I find patterns like $34-$36 or $34.50-$36.20 I need to add word hour to the text and whenever I find patterns like $45000-$34000 or $45-$34K I need to add word salary to text.

Can someone help me how to solve this in R using regular expressions?

Thank-you.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
ravi theja
  • 43
  • 1
  • 5
  • @chinsoon12 i have given part of text...even in text3 and tex4 there are words like hour, hourly and even i text1 and text2 there are words like salary...so i want to detect these $45 sort of numbers and give extra weightage and not every sample similar to text3 and text4 has per month string – ravi theja Apr 06 '17 at 01:08
  • Your text3 and text 4 _already_ say salary. Do you want to add it again? text1 and text2 already say hourly – G5W Apr 06 '17 at 01:33
  • 1
    Also $34K per month? Where do I get that job? – G5W Apr 06 '17 at 01:33
  • @G5W edited my question... – ravi theja Apr 06 '17 at 03:03
  • e.g. `gsub("(-\\$3[0-9])","\\1 hour",text1)` – Ben Bolker Apr 06 '17 at 03:04
  • @BenBolker Thanks a lot. With minor changes gsub("\\$[0-9]+\\.[[:digit:]]+","\\1 hour",text1) this worked for me. Can you guide me for text3 and text4 samples ? – ravi theja Apr 06 '17 at 03:16

1 Answers1

0

For one case, it might work with a negative lookahead regular expression:

# add 'hour' for 2-digit $-values (with optional decimal fraction)
# but only if NOT followed by 000 or K
gsub("(\\$\\d{1,2}(?:\\.[\\d]+)?(?!000|K))", "\\1 hour", txt, perl=TRUE)

The second case:

# add 'salary' for 4-5-digit $-values (with optional decimal fraction)
# but only if followed by 000 or K
gsub("(\\$\\d{1,2}(000|K))", "\\1 salary", txt, perl=TRUE)

I've tested this with only few snippets. Maybe your test cases are more complex than mine.

knb
  • 9,138
  • 4
  • 58
  • 85