-2

I need to take out shift id from text using regular expressions to provide correct payment. We have 3 types of messages from customer to our system:

1)Payment for shift # edc5df26-ad62-4685-ad80-4a3a60118479 receipt number #12345
2)Payment for shift # 394e3027-be5d-4369-91e6-88437c5330e0, adress: Germany, Frankfurt..
3)Payment for job shift # c921e015-74b2-4df2-84b2-e546a636272f

So the result should be:

1)'edc5df26-ad62-4685-ad80-4a3a60118479'
2)'394e3027-be5d-4369-91e6-88437c5330e0'
3)'c921e015-74b2-4df2-84b2-e546a636272f'

which can end rather with space symbol, comma, or be the end of message.

So I can only takte all symbols after # using: (?<=#).*

But have no idea what to do next. What regular expression can solve the issue?

DocZerø
  • 8,037
  • 11
  • 38
  • 66
  • 1
    What is the logic for your #1 result to *not* include `12345`? – Scott Hunter Mar 21 '22 at 14:49
  • Because we need only shift number, all symbols after it are about receipt number which we don't need to extract. – Nikita Tsekhanovich Mar 21 '22 at 14:51
  • And how is a *program* supposed to know that? – Scott Hunter Mar 21 '22 at 14:54
  • Because we need only first expression following # symbol. We don't care if it shows up again. – Nikita Tsekhanovich Mar 21 '22 at 15:00
  • How. Is. The. Program. Supposed. To. Know. That? – Scott Hunter Mar 21 '22 at 15:01
  • Which program? I don't get whay you mean. On next ster we use extracted 'edc5df26-ad62-4685-ad80-4a3a60118479' as id in shifts datasets to get more details about payment. Thats why we are extracting. The problem is that the message come from clients who can writhe payment description in any form they want and somehow mention this very shift id, thats why we have to extract it. – Nikita Tsekhanovich Mar 21 '22 at 15:05
  • Does this answer your question? [Searching for UUIDs in text with regex](https://stackoverflow.com/questions/136505/searching-for-uuids-in-text-with-regex) – AD7six Mar 21 '22 at 15:06
  • 1
    Alternatively `/Payment for (?:job )?shift # ([a-f0-9-]+)/` ([see here](https://regex101.com/r/HD0E2I/1)) is pretty simple. Please add to the question whatever you've tried. – AD7six Mar 21 '22 at 15:12
  • 1
    This is a very clear question with example data, expected results and what the OP tried to get the matches. – The fourth bird Mar 23 '22 at 11:43
  • use regex101.com and look at bottom right. Hint look into lookaheads/lookbehinds – JGFMK Mar 31 '22 at 19:38

2 Answers2

2

You could assert shift # to the left, and then match the range of allowed characters followed by repeating the hyphen at least 1 or more times.

(?<=\bshift # )[a-f0-9]+(?:-[a-f0-9]+)+

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

Right after you matched the # symbol, you can start capturing your shift ID with this regex for example:

(?<=#)\s([a-z-\d]+)
  • \s: to match the whitespace character
  • (): to capture your id
  • [a-z-\d]: to match any lowcase character, hyphen and digit
Ugo Q
  • 139
  • 4
  • Thanks, it solves most of the issue, but if we use it on first example, it takes #12345 as well. We need only first occurance of expression after '#' symbol – Nikita Tsekhanovich Mar 21 '22 at 15:02
  • In your example you have a space between the # symbol and the Shift id, so it doesn't match with #12345. However, if it turns out that there is no space, the regex contains parentheses that allow you to extract only the first group and therefore ignore the following matches, including receipt number #12345 – Ugo Q Mar 21 '22 at 15:09