0

i am learning python, and i am having troubles with saving the output of a small function to file. My python function is the following:

#!/usr/local/bin/python

import subprocess
import codecs

airport = '/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport'

def getAirportInfo():
    arguments = [airport, "--scan" , "--xml"]
    execute = subprocess.Popen(arguments, stdout=subprocess.PIPE)
    out, err = execute.communicate()
    print out
    return out

airportInfo = getAirportInfo()

outFile = codecs.open('wifi-data.txt', 'w')
outFile.write(airportInfo)
outFile.close()

I guess that this would only work on a Mac, as it references some PrivateFrameworks.

Anyways, the code almost works as it should. The print statement prints a huge xml file, that i'd like to store in a file, for future processing. And here start the problems. In the version above, the script executes without any errors, however, when i try to open the file, i get an error message, along the lines of error with utf-8 encoding. Ignoring this, opens the file, and most of the things look fine, except for a couple of things:

  • some SSID have non-ascii characters, like ä, ö and ü. When printing those on the screen, they are correctly displayed as \xc3\xa4 and so on. When I open the file it is displayed incorrectly, the usual random garbage.

  • some of the xml values look like these when printed on screen: Data("\x00\x11WLAN-0024FE056185\x01\x08\x82\x84\x8b\x96\x0c\ … x10D\x00\x01\x02") but like this when read from file: //8AAAAAAAAAAAAAAAAAAA==

I thought it could be an encoding error (seen as the Umlauts have problems, the error message says something about the utf-8 encoding being messed up, and the text containing \x type of characters), and i tried looking here for possible solutions. However, no matter what i do, there are still errors:

  • adding an additional argument 'utf-8' to the codecs.open yields a UnicodeDecodeError: 'ascii' codec can't decode byte 0x9a in position 24227: ordinal not in range(128) and the generated file is empty.

  • explicitly encoding to utf-8 with outFile.write(airportInfo.encode('utf-8')) before saving results in the same error

  • not being an expert, i tried decoding it, maybe i was just doing the exact opposite of what needed to be done, but i got an UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position 8980: invalid start byte

The only the thing that worked (unsurprisingly), was to write the repr() of the string to file, but that is just not what i need, and also i can't make a nice .plist of a file full with escape symbols.

So please, please, can somebody help me? What am i missing? If it helps, the type that gets saved in airportInfo is str (as in type(airportInfo) == str) and not u

Machavity
  • 30,841
  • 27
  • 92
  • 100
daniel f.
  • 1,421
  • 1
  • 13
  • 24
  • 1
    Python 2 and 3 differ somewhat in their handling of unicode. Which are you using? Additionally, what happens if you don't use codecs, and just write the data to a file? – Roland Smith Dec 19 '12 at 18:25

2 Answers2

1

You don't need re-encoding when your text is already unicode. Just write the text to a file. It should just work.

In [1]: t = 'äïöú'

In [2]: with open('test.txt', 'w') as f:
    f.write(t)
   ...:     

Additionally, you can make getAirportInfo simpler by using subprocess.check_output(). Also, mixed case names should only be used for classes, not functions. See PEP8.

import subprocess

def get_airport_info():
    args = ['/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport', 
            '--scan', '--xml']
    return subprocess.check_output(args)

airportInfo = get_airport_info()
with open('wifi-data.txt', 'w') as outf:
   outf.write(airportinfo)
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • i tested a simple case, like your first lines, and it worked as expected, however not on this the wifi scanning script. unfortunately, i am in another location right now (different wifi names) and i can't test/replicate the problem. Anyways, many thanks for the style guide, it is always good to learn properly from start. – daniel f. Dec 20 '12 at 22:59
0

I should have read this before my original answer: What is the difference between encode/decode?

I always get confused between string and unicode conversion. On my mac, import sys; sys.getfilesystemencoding() suggests that subprocess returns a 'utf-8' string - so I don't know why airportInfo.encode('utf-8') fails. Is it possible to do airportInfo.encode('utf-8', 'ignore') and throw out the invalid characters?

Also - have you tried writing your file as wb: outFile = codecs.open('wifi-data.txt', 'wb') - doesn't 'w' open an ascii file?

Regarding your text editor - that may handle unicode characters differently. If it's reading a unicode text file as ascii, then the unicode characters may appear a garbled mess. You might try naming the file .xml, in which depending on your text editor may read it better as unicode.

Community
  • 1
  • 1
Adam Morris
  • 8,265
  • 12
  • 45
  • 68