I am completely new to python but I found a package that I need to use and am testing it. The python package in question is pywurfl.
I have created a simple code based on the example given by reading the User-agent (UA) strings from a column in a simple text file. There are a very large number of UAs (some might have foreign characters). Now the file containing the UAs has been produced with the bash output command ">" and a perl script. For example, perl somescript.pl > outfile.txt.
However, when running the following code in that file I get an error.
#!/usr/bin/python
import fileinput
import sys
from wurfl import devices
from pywurfl.algorithms import LevenshteinDistance
for line in fileinput.input():
line = line.rstrip("\r\n") # equiv of chomp
H = line.split('\t')
if H[27]=='Mobile':
user_agent = H[23].decode('utf8')
search_algorithm = LevenshteinDistance()
device = devices.select_ua(user_agent, search=search_algorithm)
sys.stdout.write( "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s" % (user_agent, device.devid, device.devua, device.fall_back, device.actual_device_root, device.brand_name, device.marketing_name, device.model_name, device.device_os, device.device_os_version, device.mobile_browser, device.mobile_browser_version, device.model_extra_info, device.pointing_method, device.has_qwerty_keyboard, device.is_tablet, device.has_cellular_radio, device.max_data_rate, device.wifi, device.dual_orientation, device.physical_screen_height, device.physical_screen_width,device.resolution_height, device.resolution_width, device.full_flash_support, device.built_in_camera, device.built_in_recorder, device.receiver, device.sender, device.can_assign_phone_number, device.is_wireless_device, device.sms_enabled) + "\n")
else:
# do something else
pass
Here H[23] is the column that has the UA string. but I get an error that looks like
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte
When I replaced 'utf8' with 'latin1' I got the following error
sys.stdout.write(................) # with the .... as in the code
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128).
Am I doing anything wrong here? I need to convert the UA string in Unicode because the package is so. I am not too well versed in Unicode, especially in python. How would I handle this error? For instance, find out the UA string that is giving this error so that I can make a more informed question?