1

I have a Powershell script which does many things, and ultimately writes a variable with an integer value to a text file. Below is a simplified example:

$theValue = 531231245
$theValue | Out-File .\Test.txt

I have also tried to add the ToString() method:

$theValue = 531231245
$theValue.ToString() | Out-File .\Test.txt

It produces a text file, and when I double click on it, there are no surprises. I see theValue in both cases in the text file, clearly as numerical values.

However, I then try to read it in python and it produces a strange result

with open("Test.txt", 'r') as FID: 
    theText = FID.read()
print(theText)

Then the output is:

Output : ÿþ5 3 1 2 3 1 2 4 5

This is actually the least weird output, as I've received some strange strings that looked like bytes encoding. I tried decode, readlines and many other things.

I don't understand why I can't properly read the simple string from the text file. Any ideas?

geekygeek
  • 611
  • 4
  • 15
  • 2
    seems like an encoding issue, try using `Out-File -Encoding utf8 ....` in your powershell script. you don't need the `.ToString()` btw. – Santiago Squarzon May 04 '22 at 23:34
  • updated answer to strip unicode character ÿþ from the file text. This will work for other encoding issues. – Captain Caveman May 04 '22 at 23:47
  • Can you post the bytes value of the file? `print(open("Test.txt", "rb").read())`. – tdelaney May 04 '22 at 23:48
  • 1
    on some WIndows shell doesn't use `UTF-8` but different encoding and this can make problem. ie . `latin1`, `cp1250` or somethink different and you may need to use correct `encoding` in `open()` or you may search infrmation how to use `registers` in Windows to set `UTF-8` in shell. ie. [Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10) - Stack Overflow](https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window) – furas May 05 '22 at 00:30

1 Answers1

3
  • In Windows PowerShell, the Out-File cmdlet produces UTF-16LE ("Unicode") files by default, as does its effective alias, >

    • PowerShell (Core) 7+, by contrast, fortunately now consistently defaults to BOM-less UTF-8.
  • Thus, you have two options:

    • Use Out-File's / Set-Content's -Encoding parameter to produce a file in the character encoding that Python recognizes by default.

    • Use the open() function's encoding parameter to match the encoding produced by PowerShell; for Windows PowerShell:

      with open("t.txt", 'r', encoding='utf-16le') as FID: 
        theText = FID.read()
      print(theText)
      
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Thanks! I also added the line `theText = theText.strip().encode("ascii", "ignore").decode()` to clean up unicode characters and whitespace – geekygeek May 05 '22 at 16:18