0

What I want to do is very straight forward.

  1. Retrieve text from a file
  2. If the text contains any quotes, get the text inside the quotes.

To do that I am using this regex, borrowed from another post.

re.findall('"([^"]*)"', text)

The problem I am running into though, is that the particular quotes that are contained in my text files aren't being recognized as quotes.

For example:

text = #get text from a file

print(text) 
#Outputs: 'this is a "test"'

print(re.findall('"([^"]*)"', text))
#Outputs: []

But if I type the string directly in as a variable it functions correctly.

text = 'this is a "test"'

#The same regex outputs ['test']

So I believe that my problem here is something to do with the encoding. That being said type(text) returns str.

Edit: Solution I found thanks to @rmharrison Here is what is now working

import re
from unidecode import unidecode

text = # Text From File

cleaned_text = unidecode(text)

print(re.findall('"([^"]*)"', cleaned_text))

#This successfully outputs text inside quotes. 
Tyler Bell
  • 837
  • 10
  • 30

1 Answers1

0

Solution I found thanks to @rmharrison Here is what is now working

import re
from unidecode import unidecode

text = # Text From File

cleaned_text = unidecode(text)

print(re.findall('"([^"]*)"', cleaned_text))

#This successfully outputs text inside quotes. 
Tyler Bell
  • 837
  • 10
  • 30