Ok, I am doing a unicode regex match on some strings.
These are the strings in question. Not two separate lines, but two separate strings.
\u2018Mummy\u2019 Reboot May Get \u2018Mama\u2019 Director
\u2018Glee\u2019 Star Grant Gustin to Play The Flash in \u2018Arrow\u2019 Season 2
And I am using this regex to parse out the titles surround in unicode quotes.
regex = re.compile("\\u2018[^(?!\\u2018$)]*\\u2019",re.UNICODE)
using regex.findall() returns me
['u2018Mama\\u2019']
and
['u2018Glee\\u2019', 'u2018Arrow\\u2019']
This brings up two questions that I couldn't figure out. why isn't it returning \u2018, where is the initial \?
Secondly, what is different. I can't see it. Finally, I replaced \u2018 and \u2019 with '. Then using this regex.
re.compile("'[^']*'")
It matches both in both strings. What is the difference here? What am I missing in the unicode regex?
Thank you in advance.