I have the task of taking a text file and having it be read by Python as simply a very long string. That is to say, it's not like a csv or tsv, there is no tabular structure to the text file at all, it's just a slew of words. However the text file contains commas and quotes and things to that nature so I'm getting parsing issues.
I have tried:
with open('text_file.txt') as f:
text_data = f.read().translate(string.punctuation)
This resulted in an error that read: 'charmap' codec can't decode byte 0x9d in position 47: character maps to 'undefined'
I'm not sure if that error was the result of punctuation within the .txt file interfering with the parsing process, or if there were some strange non-Unicode characters that cannot be read. Potentially, I may need a solution that is robust to both of these problems.
If you feel that there are better ways than my simultaneous read/strip punctuation approach to achieve my goal, feel free to suggest alternatives.