3

I am using vobject in python. I am attempting to parse the vcard located here:

http://www.mayerbrown.com/people/vCard.aspx?Attorney=1150

to do this, I do the following:

    import urllib
    import vobject

    vcard = urllib.urlopen("http://www.mayerbrown.com/people/vCard.aspx?Attorney=1150").read()
    vcard_object = vobject.readOne(vcard)

Whenever I do this, I get the following error:

Traceback (most recent call last):
  File "<pyshell#86>", line 1, in <module>
    vobject.readOne(urllib.urlopen("http://www.mayerbrown.com/people/vCard.aspx?Attorney=1150").read())
  File "C:\Python27\lib\site-packages\vobject-0.8.1c-py2.7.egg\vobject\base.py", line 1078, in readOne
    ignoreUnreadable, allowQP).next()
  File "C:\Python27\lib\site-packages\vobject-0.8.1c-py2.7.egg\vobject\base.py", line 1031, in readComponents
    vline = textLineToContentLine(line, n)
  File "C:\Python27\lib\site-packages\vobject-0.8.1c-py2.7.egg\vobject\base.py", line 888, in textLineToContentLine
    return ContentLine(*parseLine(text, n), **{'encoded':True, 'lineNumber' : n})
  File "C:\Python27\lib\site-packages\vobject-0.8.1c-py2.7.egg\vobject\base.py", line 262, in __init__
    self.value = str(self.value).decode('quoted-printable')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 29: ordinal not in range(128)

I have tried a number of other variations on this, such as converting vcard into unicode, using various encodings,etc. But I always get the same, or a very similar, error message.

Any ideas on how to fix this?

AndroidLearner
  • 4,500
  • 4
  • 31
  • 62
Neil Aggarwal
  • 511
  • 1
  • 10
  • 29
  • how did you solve it? For me the `vcard = vcard.decode('utf-8')` is not helping. I also deal with en/de coding in base64 because of transportation issues. – andilabs Apr 08 '14 at 14:07
  • hey. Sorry, I just saw this. I ended up solving it by creating my own hack that sits atop the library. Its not pretty. In fact its ugly. Let me know if you are still struggling with this, and I can look through my code and remember what I did. It had to do with trying to create the object, if it failed with a unicode error, trying to find the portion of the vcard that caused the error, eliminating it, and trying again (repeat until you get an object). The result is, you end up losing some data from some ppl, but at least I get some data. – Neil Aggarwal May 21 '14 at 20:59
  • http://stackoverflow.com/questions/14249288/change-quoted-printable-encoding-to-utf-8 – andilabs May 21 '14 at 21:25

3 Answers3

2

It's failing on line 13 of the vCard because the ADR property is incorrectly marked as being encoded in the "quoted-printable" encoding. The ü character should be encoded as =FC, which is why vobject is throwing the error.

Michael
  • 34,873
  • 17
  • 75
  • 109
  • JUST USE: `quoted-printable` http://stackoverflow.com/questions/14249288/change-quoted-printable-encoding-to-utf-8 – andilabs Apr 08 '14 at 16:07
0

File is downloaded as UTF-8 (i think) encoded string, but library tries to interpret it as ASCII.

Try adding following line after urlopen:

vcard = vcard.decode('utf-8')
cleg
  • 4,862
  • 5
  • 35
  • 52
0

vobject library readOne method is pretty awkward.

To avoid problems I decided to persist in my database the vcards in form of quoted-printable data, which the one likes.

assuming some_vcard is string with UTF-8 encoding

quopried_vcard = quopri.encodestring(some_vcard)

and the quopried_vcard gets persisted, and when needed just:

vobj = vobject.readOne(quopried_vcard)

and then to get back decoded data, e.g for fn field in vcard:

quopri.decodestring(vobj.fn.value)

Maybe somebody can handle UTF-8 with readOne better. If yes I would love to see it.

andilabs
  • 22,159
  • 14
  • 114
  • 151