I'm trying to parse YouTube description's of songs to compile into a .csv
Currently I can isolate timecodes, though making an attempt on isolating the song and artist is proving trickier.
First, I catch the whitesapce
# catches whitespace
pattern = re.compile(r'\s+')
Second, the timecodes (to make the string simpler to deal with)
# catches timecodes
pattern1 = re.compile(r'[\d\.-]+:[\d.-]+:[\d\.-]+')
then I sub and remove.
I then try to capture all strings between \n, as this is how the tracklist is formatted
songBeforeDash = re.search(r'^([\\n][a-zA-Z0-9]*-[a-zA-Z0-9]*[\\n]*)+$', description)
The format follows \n[string]-[string]\n
Using this excellent visualiser , I've been able to tweak it so it catches the first result, however any subsequent results don't match. Is this a case of stopping at the first result and not catching the others?
Here's a sample of what I'm trying to catch
\nmiddleschoolxAso-Cypress\nShopan-Woodnot\nchromonicci-Memories.\nYasper-MoveTogether\nFenickxDelayde-Longwayhome\nauv-Rockaway5pm\nsadtoi-Aires\nGMillsxKyleMcEvoy-Haze\nRuckP-CoffeeBreak\n