3

In Python when I use readlines() to read the contents of a .txt file created in Powershell, I get a list of stuff I do not understand.

I have tried a for loop that goes through the list of lines, and prints :D when the type(line) is str, and I have tried detecting a sub string within the lines.

Right now this is what I have:

from pprint import pprint
with open(f, "r") as file:
    lines = file.readlines()
    pprint(lines)

My expectation was for it to print the lines of my text file like this:

['line 1'
 'line 2'
 'line 3']

but instead it printed this (shortened for readability):

['ÿþ:\x00)\x00 \x00>\x00 \x00V\x00a\x00l\x00u\x00e\x00 '
 '\x00d\x00o\x00e\x00s\x00 \x00n\x00o\x00t\x00 \x00m\x00a\x00t\x00c\x00h\x00 '
 '\x00f\x00o\x00r\x00 '
 '\x00P\x00a\x00s\x00s\x00w\x00o\x00r\x00d\x00E\x00x\x00p\x00i\x00r\x00a\x00t\x00i\x00o\x00n\x00 '
 '\x00:\x00 '
 '\x00']

I created the text file using Out-File in a Powershell script, is it possible that has something to do with my output?

  • 1
    I googled "ÿþ" and found that it's the byte order mark in UTF-16. So I would look at questions like [this](https://stackoverflow.com/questions/13590749/reading-unicode-file-data-with-bom-chars-in-python) –  Jun 19 '19 at 17:25
  • 3
    You need to open your file using the proper encoding (utf-16 it seems). `.readlines` is working fine – juanpa.arrivillaga Jun 19 '19 at 17:27
  • So how do I go about decoding the file? I tried file.decode() and line.decode(), but neither function exists. – Ben Morrison Jun 19 '19 at 17:34
  • 2
    https://docs.python.org/3/howto/unicode.html#reading-and-writing-unicode-data `with open(f, encoding='utf-16') as file:` – tgikal Jun 19 '19 at 17:39
  • Alright that last comment worked, what should I do to the question now? (Thank you guys for helping me! I am new to SO since the last time my post got down voted and I didn't want to ask anymore stupid questions) – Ben Morrison Jun 19 '19 at 17:41
  • You can (actually, you *should*) use the `-Encoding` parameter for `Out-File`. This way you determine the output file encoding explicitly instead of relying on defaults. When you open the file with Python, make sure you specify the same encoding and you're good. Your trouble originated from relying on defaults in both cases, which is not a good idea. – Tomalak Jun 19 '19 at 19:54
  • @Tomalak I had no clue how to do anything in Powershell so I was just happy I got an output file, I didn't even think about encoding lol! That was my first time using Powershell (and hopefully my last) so I had no clue what I was doing. – Ben Morrison Jul 01 '19 at 20:27

0 Answers0