For python
For your specific requirement according to your filename formats:
re.findall(r'ft\.\s*(\w*)',filename)
Each of these filenames:
Will return:
If you want to account for a number of other possible scenarios:
In your provided examples, each FeatArtist
terminates with one of the following: A space followed by a -
, a round close bracket, and the file extension .mp3
If we had any of the following:
Things might fall apart. One way to tackle the above variants might be:
First get rid of the file extension without using string matching at all. Doing this with filenames gives you a cleaner starting point:
Using os.path.splitext('Artist - Track ft. FeatArtist.mp3')[0])
you can get your files in this format: Artist - Track ft. FeatArtist
We can accomodate the new filenames with this regex:
Unit Tests: (Listed respectively for easier reading):
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track ft. FeatArtist')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist ft. FeatArtist - Track')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track (ft. FeatArtist)')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track (ft. Feat Artist)')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist - Track (ft. Feat Artist & Other Artist)')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist ft. Feat Artist & Other Artist - Track')
>>> re.findall(r'ft\.\s*(\w*.*?)(?= -|\)|$)','Artist ft. Feat.Artist & Crew - Track')
Results:
['FeatArtist']
['FeatArtist']
['FeatArtist']
['Feat Artist']
['Feat Artist & Other Artist']
['Feat Artist & Other Artist']
['Feat.Artist & Crew']
Why no lookbehind ?
From the python man (formatting added):
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Therefore you can still use repition operators to establish the match, and use groups to control the portion of the match returned.
Other ways to do something similar:
If using a regex engine that supports \K
back reference, then the match would be everything after the \K
:
Examples using grep
with -P
(Perl Regex) and -o
(Only return match):
echo "Artist - Track ft. FeatArtist" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
FeatArtist
echo "Artist ft. FeatArtist - Track" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
FeatArtist
echo "Artist - Track (ft. FeatArtist)" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
FeatArtist
echo "Artist ft. Feat Artist & Other Artist - Track" | grep -oP "ft\.\s*\K(\w*.*?)(?= -|\)|$)"
Feat Artist & Other Artist