2

So I am trying to teach myself python and pymarc for a school project I am working on. I have a sample marc file and I am trying to read it using this simple code:

from pymarc import *

reader = MARCReader(open('dump.mrc', 'rb'), to_unicode=True)

for record in reader:
    print(record)

The for loop is to just print out each record to make sure I am getting the correct data. The only thing is I am getting this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

I've looked online but could not find an answer to my problem. What does this error mean and how can I go about fixing it? Thanks in advance.

user3554599
  • 81
  • 1
  • 3
  • 13

1 Answers1

1

You can set the python environment to support UTF-8 and get record as a dictionary.

Try:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

reload(sys)
sys.setdefaultencoding('utf-8')

from pymarc import *

reader = MARCReader(open('dump.mrc', 'rb'), to_unicode=True, force_utf8=True)
for record in reader:
    print record.as_dict() 

Note:

  1. If you still get the unicode exception, you can set to_unicode=False and skip force_utf8=True.

  2. Also please check if your dump.mrc file is encoded to UTF-8 or not. Try: $ chardet dump.mrc