1

I currently have a regex which works fine for certain type of input.

Reg Ex: (.*)\s*PER\s*([^\s]+).*

Sample Input 1 : 1.0 PER Sample DEAD VOLUME 1

Output:

Matching Group 1 : 1.0

Matching Group 2 : Sample

Sample Input 2 : 1.0 PER Request DEAD VOLUME 1

Output:

Matching Group 1 : 1.0

Matching Group 2 : Request

Now i need to modify the regex to also work for inputs like below.

Input 1 : 10.0 PER Empty Well In Column DEAD VOLUME 10

Expected Output:

Matching Group 1 : 10.0

Matching Group 2 : Empty Well

Matching Group 3 : Column

Input 2 : 8.0 PER Empty Well In Row DEAD VOLUME 8

Expected Output:

Matching Group 1 : 8.0

Matching Group 2 : Empty Well

Matching Group 3 : Row

I have found a reg ex which processes the second type of inputs successfully.

RegEx: (.*)\s*PER\s*(.*)\s*In\s*(.*)\s*DEAD\s*.*

Is there way i can make a regex which will work for both these type of inputs

UPDATE:

Hi Just need one more help...I forgot to mention one more condition...This reg also needs to work for the below inputs.

  1. 1.0 PER Sample
  2. 1.0 PER Request
  3. 10.0 PER Empty Well In Column

Meaning the DEAD VOLUME portion is an optional one.

Is this possible too????

Prakash
  • 139
  • 1
  • 9

1 Answers1

3

You may use an optional group with dot lazy matching:

^([\d.]+)\s+PER\s+(.*?)(?:\sIn\s*(.+?))?(?:\s*DEAD.*)?$

See this regex demo

Matches:

1.0 PER Sample DEAD VOLUME 1
   1.0
   Sample

1.0 PER Request DEAD VOLUME 1
   1.0
   Request

10.0 PER Empty Well In Column DEAD VOLUME 10
   10.0
   Empty Well
   Column

8.0 PER Empty Well In Row DEAD VOLUME 8
   8.0
   Empty Well
   Row

Explanation:

  • ^ - start of string
  • ([\d.]+) - Group 1 capturing any 1+ digits or dots
  • \s+ - 1+ whitespaces
  • PER - a literal text PER
  • \s+ - 1+ whitespaces
  • (.*?) - Group 2 capturing any 0+ chars other than a newline as few as possible up to the first...
  • (?:\sIn\s*(.+?))? - 1 or 0 sequences of:
    • \sIn - a whitespace followed with In text
    • \s* - zero or more whitespaces
    • (.+?) - Group 3 capturing one or more chars other than a newline as few as possible up to the first...
  • (?:\s*DEAD.*)? - an optional group matching:
    • \s* - 0+ whitespaces
    • DEAD - a literal text DEAD
    • .* - any 0+ chars other than a newline up to
    • $ - the end of string ($).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Hi Just need one more help...I forgot to mention one more condition...Just have a look at the updated portion in the question and kindly help me resolve this... – Prakash Jul 14 '16 at 12:35
  • Let me see. The point is that an optional group after `.*` or `.*?` should be "anchored" to an obligatory pattern, or it will be "garbled" by the dot. Try [`^(.*?)\s*PER\s*((?:(?!\s(?:In|DEAD)).)*)(?:\sIn\s*(.+?))?(?:\s*DEAD.*)?$`](https://regex101.com/r/aM7kL6/8) – Wiktor Stribiżew Jul 14 '16 at 12:39
  • Or a simpler one: [`^([\d.]+)\s+PER\s+(.*?)(?:\sIn\s*(.+?))?(?:\s*DEAD.*)?$`](https://regex101.com/r/aM7kL6/10) – Wiktor Stribiżew Jul 14 '16 at 12:43