2

I am writing a script that is responsible for reading some values from a .csv file and write them in another .csv file.

header = ["Title", "Authors", "Year", "Abstract", "Keywords"]

fields_number = int(input("Enter the number of fields you want to get: "))

field_names = list()
field_values = list()
for i in range(0, fields_number):
    field_name = input("Enter the field name: ")
    field_names.append(field_name)

try:
    with open(filename) as csvfile:
        rowsreader = csv.DictReader(csvfile)
        for row in rowsreader:
            print(row)
            json_row = '{'
            for i in range(0, len(field_names)):
                field = field_names[i]
                json_row += '"{}":"{}"'.format(header[i], row[field])
                json_row += ',' if (i < len(field_names) - 1) else '}'
            field_values.append(json.loads(json_row))
except IOError:
    print("Could not open csv file: {}.".format(filename))

I am getting the following output:

 Traceback (most recent call last):
  File "slr_helper.py", line 58, in <module>
    main()
  File "slr_helper.py", line 37, in main
    json_row += '"{}":"{}"'.format(header[i], row[field])
KeyError: 'Authors'

The beginning of the csv file has the following values:

Authors,Author Ids,Title,Year,Source title,Volume,Issue,Art. No.,Page start,Page end,Page count,Cited by,DOI,Link,Abstract,Author Keywords,Index Keywords,Sponsors,Publisher,Conference name,Conference date,Conference location,Conference code,Document Type,Access Type,Source,EID
"AlHogail A., AlShahrani M.","51060982200;57202888364;","Building consumer trust to improve Internet of Things (IoT) technology adoption",2019,

But the code is printing this when reading the csv file:

OrderedDict([('\ufeffAuthors', 'AlHogail A., AlShahrani M.'), ('Author Ids', '51060982200;57202888364;'),...

I would like to know how to avoid this OrderedDict([('\ufeff, since it is causing the error I am getting.

martineau
  • 119,623
  • 25
  • 170
  • 301
Dalton Cézane
  • 3,672
  • 2
  • 35
  • 60
  • 1
    have you tried using pandas? It'll be done in three lines. – Tim Gottgetreu Sep 27 '18 at 22:38
  • 2
    @TimGottgetreu pandas is like a sledgehammer here. – juanpa.arrivillaga Sep 27 '18 at 22:40
  • 2
    What are you doing with all that JSON stuff? Why are you manually building JSON? It looks like you are trying to serialize the dict to a JSON Object string, but you can do that with `json.dumps(row)`, but you then immediately do `json.loads(json_row)`??? This doesn't make any sense. What, **exactly** are you trying to do here? – juanpa.arrivillaga Sep 27 '18 at 22:42
  • 3
    Anyway, `'\ufeff` is the byte-order mark, i.e. the BOM. So that implies you should be opening your file using `encoding='utf16'` – juanpa.arrivillaga Sep 27 '18 at 22:44
  • It's not exactly the same context (web scraping rather than a CSV file), but this quest may be a duplicate of [this previous question](https://stackoverflow.com/questions/17912307/u-ufeff-in-python-string). The second answer seems like it will solve the problem for you. – Blckknght Sep 28 '18 at 00:55
  • @juan , as soon as possible I will try opening the file with `encoding='utf16'`. Then, I comment here. – Dalton Cézane Sep 28 '18 at 01:31
  • I tried using `utf16`, but I received an error. Then I tried with `encoding='utf-8-sig'` and it worked. @Sasha , please elaborate an answer and I will mark it. Thank you all. – Dalton Cézane Sep 28 '18 at 13:35

1 Answers1

4

As juanpa.arrivillaga pointed out, \ufeff is the byte order mark (BOM). It resides right at the beginning of the file, which is permitted for the UTF-8 format: enter image description here

By default Python 3 opens files with encoding='utf-8', which doesn't treat BOM different than other code points and reads it as if it were a piece of text contents. We need to specify encoding as 'utf-8-sig' to change that:

with open(filename, encoding='utf-8-sig') as csvfile:

By the way if you are on Linux you can use file ${filename} in the terminal, it will print the details about encoding.

Sasha Tsukanov
  • 1,025
  • 9
  • 20
  • 1
    As a side note (I can see you are working with JSON format here), it is illegal to add a BOM to JSON https://tools.ietf.org/html/rfc7159#section-8.1 – Sasha Tsukanov Sep 28 '18 at 14:20
  • Following the recommendation of @juanpa.arrivillaga , I also changed the JSON part (just using dictionary now). – Dalton Cézane Sep 28 '18 at 14:33