2

I need some help writing regex expressions. I need an expression that can match the following patterns (including words and digits, spaces and commas):

  • Line 145
  • Line3544354
  • Lines 10,12
  • Line items 45,10,26
  • Lines 10 and 45

Thus far, I wrote one expression which includes the first three patterns and all case variations:

r'(?i)(line item[\.*\,*\s*\d+]+]+|line[\.*\,*\s*\d+]+|lines[\.*\,*\s*\d+]+|line items[\.*\,*\s*\d+]+)'

I would like to include the last two patterns listed but not sure how. I have wrote this expression for the pattern matching "Lines 10 and 45" by modifying the capturing group as follows:

r'(Lines[\.*\,*\w*\s*\d+]+)'

However, it does not work as expected. It selects all word characters in the string. I would like to keep my expressions greedy, but not sure how to implement the last two patterns in the list.

Any suggestions please?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    [`(?i)lines?(?:\s+items?)?\s*\d+(?:\s*(?:,|and)\s*\d+)*`](https://regex101.com/r/R3ktxn/1). Or [this one](https://regex101.com/r/R3ktxn/2). `[...]` are character classes, not grouping constructs. – Wiktor Stribiżew Nov 19 '19 at 18:39
  • Great, thank you very much. Could you please share a brief explanation of the expression? I am trying to understand what the different ? and () mean within the expression. – brightcitrus Nov 19 '19 at 19:01
  • Also I have "Line 96.1" at the beginning of a string in my text but this formula is not capturing it and returns NA. Do you know why? Thank you! – brightcitrus Nov 19 '19 at 19:02
  • 1
    You did not try my second link solution. I posted it with explanation. – Wiktor Stribiżew Nov 19 '19 at 19:06

1 Answers1

2

You may use

(?i)lines?(?:\s+items?)?\s*\d+(?:\.\d+)?(?:\s*(?:,|and)\s*\d+(?:\.\d+)?)*

See the regex demo.

Pattern details:

  • (?i) - ignore case inline modifier
  • lines? - line or lines (? quantifier makes the preceding pattern optional, matching 1 or 0 occurrences)
  • (?:\s+items?)? - an optional non-capturing group matching 1 or 0 occurrences of 1+ whitespaces followed with item and an optional s char
  • \s* - 0+ whitespaces
  • \d+(?:\.\d+)? - 1+ digits followed with an optional sequence of . and 1+ digits
  • (?:\s*(?:,|and)\s*\d+(?:\.\d+)?)* - 0 or more repetitions of
    • \s* - 0+ whitespaces
    • (?:,|and) - , or and char sequence
    • \s* - 0+ whitespaces
    • \d+(?:\.\d+)? - 1+ digits followed with an optional sequence of . and 1+ digits
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Wonderful, you truly are a RegEx expert! Thank you again. I will study your explanation and use it as a reference to review all my RegEx formulas. – brightcitrus Nov 19 '19 at 19:15
  • 1
    @brightcitrus I do not know how messy your input is, so I suggested the safest pattern. It is a bit long, but is precise. You may further try replacing parts of the regex to see if it still does what you need, say, replace `\d+(?:\.\d+)?` with `\d[.\d]*`, or even replace the whole `\s*\d+(?:\.\d+)?(?:\s*(?:,|and)\s*\d+(?:\.\d+)?)*` with a `(?:and|[ \d.,])*`.... :) – Wiktor Stribiżew Nov 19 '19 at 19:22
  • Hello Wiktor, I have a quick question please. How can I implement this pattern using an "or" statement? I would like the expression to search for and return all instances of fmea or doc- such as: r'((?i)(fmea|doc\-?)\s*\d+(?:\.\d+)?(?:\s*(?:,|and)\s*\d+(?:\.\d+)?)*)' But the expression becomes lazy. Any tip please? Thanks. – brightcitrus Dec 16 '19 at 18:08
  • @brightcitrus Without sample strings/expected output, it is not clear what problem you have. – Wiktor Stribiżew Dec 16 '19 at 18:22
  • Thanks for your prompt reply. I would like to use the same regex you provided but I cannot find the correct modication to match my output. The string to search includes either "fmea" followed by a number or "doc-" followed by a number. A string might include both fmea followed by number and doc- followed by number. The regex you provided works well but only returns the first occurence of either fmea or doc-. It does not return both when there is both. How can I fix this please? – brightcitrus Dec 16 '19 at 18:26
  • @brightcitrus Please use http://regex101.com to share some input strings. – Wiktor Stribiżew Dec 16 '19 at 18:33
  • Thanks! Here is the link https://regex101.com/r/7M4WcZ/1 The regex captures "fmea" and "doc-" but only returns one at the time. How can make it return both when both are present in the string please. – brightcitrus Dec 16 '19 at 18:36
  • 1
    @brightcitrus Looks like you only included strings with a single repetition, see [this demo](https://regex101.com/r/7M4WcZ/2) - is it what you are seeking? – Wiktor Stribiżew Dec 16 '19 at 18:41
  • Yes however it does not work in my python environment. This expression is returning "F" only when string is " FMEA 14300". – brightcitrus Dec 16 '19 at 18:45
  • I added parentheses around the regex and it runs. Unfortunately, it is still not returning all occurrences within a cell. When "FMEA" is mentioned twice in the same cell, it is only returned once. – brightcitrus Dec 16 '19 at 18:50
  • 1
    @brightcitrus I have too little idea what you mean. Please provide a reproducible example at https://ideone.com/hw8tYZ. You may fork this demo. – Wiktor Stribiżew Dec 16 '19 at 18:55
  • Worked! Thank you very much! – brightcitrus Dec 16 '19 at 20:52