-1

tests:

1. 'Product: aab bbc; Product: xyz pqr; Junk1: jjkk jjuu; Junk2: nnnkk junk; Junk3: lkie rnier rae'
2. 'Product: Fuasdcsen'
3. 'Product: marlwaw Swwō'
4. 'Product: aab bbc abc xyz; Product: xyz pqr; Junk1: jjkk jjuu; Junk2: nnnkk junk; Junk3: lkie rnier rae'

output:

1. ['aab bbc', 'xyz pqr']
2. ['Fuasdcsen']
3. ['marlwaw Swwō']
4. ['aab bbc abc xyz', 'xyz pqr']

I want list of only Product name. so, far i have came up with r'Product:\s*(.*)(?:;|$)' this regex but it only works with single Products(case 2 and 3). can somebody suggest general solution for all. I am trying to extract details between : and ;(if present).

In [10]: reg_str = 'Product: marlwaw Swwō xyz abc'

In [11]: re.findall(r'Product:\s*(.*)(?:;|$)', reg_str)
Out[11]: ['marlwaw Swwō xyz abc']
Jagrut Trivedi
  • 1,271
  • 14
  • 18

2 Answers2

4

Your regex is greedy, which makes it grab whatever it sees from there on. Make it non-greedy:

import re

reg_str = 'Product: aab bbc; Product: xyz pqr; Junk1: jjkk jjuu; Junk2: nnnkk junk; Junk3: lkie rnier rae'

print(re.findall(r'Product:\s*(.*?)(?:;|$)', reg_str))
# ['aab bbc', 'xyz pqr']
Jagrut Trivedi
  • 1,271
  • 14
  • 18
Austin
  • 25,759
  • 4
  • 25
  • 48
0

Try using:

reg = /(ARTIST_NAME1|ARTIST_2|ETC...)/

You can create this by joining a list on the pipe symbol.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
Sebastian Scholl
  • 855
  • 7
  • 11