1

I try to read all the files in a directory, but as they contain spaces and accents, I get errors (already read many posts on SO but cannot find any answer)

this returns a list of files

files = [y for x in os.walk(".") for y in glob(os.path.join(x[0], '*.pdf'))]

but as I try to open them one by one

for file in files:
    with open(file,"r") as f:

I get these kind of errors (I obfuscated the letters cos it's confidential):

IOError: [Errno 22] invalid mode ('r') or filename: '.\abcd?efgh (hijk? lmnop).pdf'

I believe the issues are caused by the accents but since it's python that gives me the the file names, I dont understand why they are not compatible with "open()"

regards

how can I fix this ?

jww
  • 97,681
  • 90
  • 411
  • 885
  • 3
    Did you try it with `os.walk(u'.')`? – Nick is tired Sep 03 '18 at 07:33
  • you're the man !!! it worked, thank you so much – phil12345678910 Sep 03 '18 at 07:39
  • What platform are you on? If it's not Windows, this could be a sign of a deeper problem with your filesystems or mount tables that you should fix or you might see other problems later. – abarnert Sep 03 '18 at 07:50
  • Also, why are you using `glob` on the results of `walk`? Why not `file for root, dirs, files in os.walk(u'.') for file in files if os.path.splitext(file) == '.pdf'`? – abarnert Sep 03 '18 at 07:51
  • *"... caused by the accents"* - I believe they are called *[diacritics](https://en.wikipedia.org/wiki/Diacritic)* (assuming more than just the accent is giving you trouble). – jww Sep 03 '18 at 10:03

1 Answers1

0

I do this now :

files = [y for x in os.walk(u'.') for y in glob(os.path.join(x[0], '*.'+extension))]

Note the use of u'.' instead of "."

Nick is tired
  • 6,860
  • 20
  • 39
  • 51
  • You also want `u'*.'`. And you probably also want `extension` to be a `unicode` rather than a `str`. – abarnert Sep 03 '18 at 07:50