What I want to do is very straight forward.
- Retrieve text from a file
- If the text contains any quotes, get the text inside the quotes.
To do that I am using this regex, borrowed from another post.
re.findall('"([^"]*)"', text)
The problem I am running into though, is that the particular quotes that are contained in my text files aren't being recognized as quotes.
For example:
text = #get text from a file
print(text)
#Outputs: 'this is a "test"'
print(re.findall('"([^"]*)"', text))
#Outputs: []
But if I type the string directly in as a variable it functions correctly.
text = 'this is a "test"'
#The same regex outputs ['test']
So I believe that my problem here is something to do with the encoding. That being said type(text) returns str.
Edit: Solution I found thanks to @rmharrison Here is what is now working
import re
from unidecode import unidecode
text = # Text From File
cleaned_text = unidecode(text)
print(re.findall('"([^"]*)"', cleaned_text))
#This successfully outputs text inside quotes.