I have downloaded a lot of song lyrics from Genius for a project (in Python) and now need to clean them. To take an example here is a snippet of a song lyric:
'lyric = [Letra de "La Jeepeta"]\n\n[Intro: Nio García & Juanka El Problematik]\nNio García\nBrray\nJuanka\nLas Air Force son brand new\nLas moña\' verde\' como mi Sea-Doo\nUnas prendas que me\u2005cambian\u2005la actitú\'\nEsta noche\u2005no queremo\' revolú\n\n[Coro: Nio García & Juanka El Problematik]\nArrebata\'o, dando vuelta en\u2005la jeepeta (Dando vuelta en la jeepeta)\nAl la\'o mío tengo una rubia que tiene grande\' las
In the lyrics I want to:
- Remove square brackets and everything between them. I do that by the following:
re.sub(r"[\[].*?[\]]", "", lyric)
- Remove line breaks
\n
. I do that by the following:
re.sub(r"[\n]"," ",lyric)
But I get the problem that if there are no \n
in the lyric I get an error.
- Remove
\u
. I am not sure why this appears in some songs.
re.sub(r"\[\u]", " ", lyric)
However, I get the following error: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 15-16: truncated \uXXXX escape
So first of all can you help me with the erros I'm getting? And secondly is there a way for me to have several RegEx expressions in one so I don't need to do it in several command?
Thanks in advance! :-)