0

I have a folder full of opinions in .txt files, i would like to read the complete folder and print each rawtext.txt in a format that let me read them one by one. How could i aproach this task?. Also when i read the complete folder with:

import os
DIR = r"/Users/user/Desktop/OpinionsTXT"
opiniones = [open(os.path.join(DIR, f)).read() for f in os.listdir(DIR)]
print opiniones

this is the output string:

f qu\xe9 suplicio, recordando cuando lo hab\xeda tenido que hacer durante unas 

The texts are full of accents and spanish ortographic symbols, how to print them correctly?

anon
  • 457
  • 2
  • 5
  • 8
  • possible duplicate of [How to list all files of a directory in Python](http://stackoverflow.com/questions/3207219/how-to-list-all-files-of-a-directory-in-python) – Scott Hunter Sep 17 '14 at 14:10
  • 2
    `for f in os.listdir(DIR): print open(os.path.join(DIR, f)).read() ` – falsetru Sep 17 '14 at 14:11

1 Answers1

2

What encoding is used for the files? It looks like ISO-8859-1.

In Python 2, for example, you can use .decode('iso-8859-1'), i.e.

import os
DIR = r"/Users/user/Desktop/OpinionsTXT"
opiniones = [open(os.path.join(DIR, f)).read().decode('iso-8859-1') for f in os.listdir(DIR)]
>>> print opiniones[0]   # note that opiniones is a list.
f qué suplicio, recordando cuando lo había tenido que hacer durante unas

Or you could open the file with the codecs module:

opiniones = [codecs.open(os.path.join(DIR, f), mode='r', encoding='iso-8859-1').read() for f in os.listdir(DIR)]

The above is for Python 2. For Python 3 you can specify the file encoding when you open the file using the encoding flag.

mhawke
  • 84,695
  • 9
  • 117
  • 138