2

been struggling with this for hours now, just can't seem to get my head around regex for some reason.

I'm looking through the strings below line by line using this pattern:

pattern = re.compile(r"^[^&,]*")

The strings are kept in a dictionary so looping over them like this:

for dct in lst:
    print(re.search(pattern, dct['artist']).group(0))

"""
Drake
Post Malone Featuring Ty Dolla $ign
BlocBoy JB Featuring Drake
Offset & Metro Boomin
Jay Rock, Kendrick Lamar, Future & James Blake
"""

The above gives me this as expected:

"""
Drake
Post Malone Featuring Ty Dolla $ign
BlockBoy JB Featuring Drake
Offset
Jay Rock 
"""

But I cannot figure out how to get add that it should also stop at the string "Featuring", I've tried different a 100 variations of \bFeaturing\b, capital B, different tokens in front, back, positions in the regex.

This is the closest I've gotten, but then it only matches the lines that have "Featuring":

pattern = re.compile(r"^[^&,]*(?=\bFeaturing\b)")

This gives me this output:

None
<_sre.SRE_Match object; span=(0, 12), match='Post Malone '>
<_sre.SRE_Match object; span=(0, 11), match='BlocBoy JB '>
None
<_sre.SRE_Match object; span=(0, 12), match='Post Malone '>
None

I'm fairly new to this so most of what I'm doing is trial and error, but I'm on the verge of giving up. Please help me get a result like this:

"""
Drake
Post Malone
BlockBoy JB
Offset
Jay Rock 
"""
Mazdak
  • 105,000
  • 18
  • 159
  • 188
Ribzy
  • 51
  • 1
  • 8
  • Do you need a list of these items or do you need to remove the part of lines that do not match? Try [`re.findall(r'(?m)^(?:(?!\bFeaturing\b)[^&,\n])*', s)`](https://regex101.com/r/NBqO0I/2) – Wiktor Stribiżew Mar 23 '18 at 07:47
  • That did the trick! If you post an answer I'll accept it, thanks a million! – Ribzy Mar 23 '18 at 07:57

2 Answers2

1

You can use re.sub:

str = re.sub(r'\s*(?:[&,]|Featuring).*', '', str)

RegEx Demo

\s*(?:[&,]|Featuring).* will match text starting with & or , or Featuring in any line till end of line and we replace that with an empty string.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • It looks like this should work from the demo, but I didn't manage to fit it into my model for some reason. Wiktor's pattern worked with copy paste. Thanks a lot for the effort though! – Ribzy Mar 23 '18 at 07:58
  • If you want to use `re.search` then use: [`^(?:.*?(?=\s+Featuring)|[^&,\n]*)`](https://regex101.com/r/7mqTo2/2) as regex – anubhava Mar 23 '18 at 08:01
1

You may use

re.findall(r'^(?:(?!\bFeaturing\b)[^&,\n])*\b', s, re.M)

or

re.findall(r'^.*?(?=\s*(?:\bFeaturing\b|[&,]|$))', s, re.M)

See this regex demo or another one. The regexps are equivalent as far as their result is concerned.

Details

  • ^ - start of a line
  • (?:(?!\bFeaturing\b)[^&,\n])* - (see more about this construct) any char other than &, , and a newline, as many as possible, that do not start the whole word Featuring.
  • \b - a word boundary

  • .*?(?=\s*(?:\bFeaturing\b|[&,]|$)) - matches any 0+ chars other than line break chars, as few as possible (.*?) up to the leftmost occurrence of 0+ whitespaces followed with...

    • \bFeaturing\b - whole word Featuring
    • [&,] - a & or , char
    • $ - end of line
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563