1

I am trying to do a simple script that converts a CSV file to JSON. The issue I'm running into is it's returning extra characters before the csv Order column. I'm new to Python so sorry if I missed anything on the info below. My resources and script are:

CSV

Order,Business_Unit,Sold_To,Ship_To,Customer_PO,Quantity_Ordered,UoM,Item_Number,Extended_Price,P4210_Version
1,M30,4242,4242,Line1,5,EA,210,,ZJDE0001
2,M30,4242,4242,Line2,6,EA,TPL0001,10,ZJDE0001

Python Script

import csv, json

csvFilePath = "DemoExcel.csv"
jsonFilePath = "DemoJson.json"

#Read the CSV and add the data to a dictionary...

data = {}
with open(csvFilePath) as csvFile:
  csvReader = csv.DictReader(csvFile)
  for csvRow in csvReader:
    BusinessUnit = csvRow["Order"]
    data[BusinessUnit] = csvRow

#Write data to a JSON file...

with open(jsonFilePath, "w") as jsonFile:
  jsonFile.write(json.dumps(data, indent=4))

At first it wasn't able to run it successfully. So I did a print(data) and saw the CSV is being read as:

{'1': OrderedDict([('Order', '1'), ('Business_Unit', 'M30'), ('Sold_To', '4242'), ('Ship_To', '4242'), ('Customer_PO', 'Line1'), ('Quantity_Ordered', '5'), ('UoM', 'EA'), ('Item_Number', '210'), ('Extended_Price', ''), ('P4210_Version', 'ZJDE0001')]), '2': OrderedDict([('Order', '2'), ('Business_Unit', 'M30'), ('Sold_To', '4242'), ('Ship_To', '4242'), ('Customer_PO', 'Line2'), ('Quantity_Ordered', '6'), ('UoM', 'EA'), ('Item_Number', 'TPL0001'), ('Extended_Price', '10'), ('P4210_Version', 'ZJDE0001')]), '3': OrderedDict([('Order', '3'), ('Business_Unit', '30'), ('Sold_To', '4242'), ('Ship_To', '4242'), ('Customer_PO', 'Bell Media'), ('Quantity_Ordered', '209'), ('UoM', 'EA'), ('Item_Number', '210'), ('Extended_Price', '23456'), ('P4210_Version', 'ZJDE0002')]), '4': OrderedDict([('Order', '4'), ('Business_Unit', '30'), ('Sold_To', '4242'), ('Ship_To', '4242'), ('Customer_PO', 'AT&T'), ('Quantity_Ordered', '3'), ('UoM', 'M'), ('Item_Number', '210'), ('Extended_Price', ''), ('P4210_Version', 'ZJDE0002')])}

I noticed Order was being shown as Order instead of Order. So I changed my python to include Order

import csv, json

csvFilePath = "DemoExcel.csv"
jsonFilePath = "DemoJson.json"

#Read the CSV and add the data to a dictionary...

data = {}
with open(csvFilePath) as csvFile:
  csvReader = csv.DictReader(csvFile)
  for csvRow in csvReader:
    Order = csvRow["Order"]
    data[Order] = csvRow

print(data)

#Write data to a JSON file...
#"w" argument is to indicate it's being written to...
with open(jsonFilePath, "w") as jsonFile:
  jsonFile.write(json.dumps(data, indent=4))

Now it's creating the JSON file successfully but Order is being returned as

{
    "1": {
        "\u00ef\u00bb\u00bfOrder": "1",
        "Business_Unit": "M30",
        "Sold_To": "4242",
        "Ship_To": "4242",
        "Customer_PO": "Line1",
        "Quantity_Ordered": "5",
        "UoM": "EA",
        "Item_Number": "210",
        "Extended_Price": "",
        "P4210_Version": "ZJDE0001"
    },
    "2": {
        "\u00ef\u00bb\u00bfOrder": "2",
        "Business_Unit": "M30",
        "Sold_To": "4242",
        "Ship_To": "4242",
        "Customer_PO": "Line2",
        "Quantity_Ordered": "6",
        "UoM": "EA",
        "Item_Number": "TPL0001",
        "Extended_Price": "10",
        "P4210_Version": "ZJDE0001"
    },
    "3": {
        "\u00ef\u00bb\u00bfOrder": "3",
        "Business_Unit": "30",
        "Sold_To": "4242",
        "Ship_To": "4242",
        "Customer_PO": "Bell Media",
        "Quantity_Ordered": "209",
        "UoM": "EA",
        "Item_Number": "210",
        "Extended_Price": "23456",
        "P4210_Version": "ZJDE0002"
    },
    "4": {
        "\u00ef\u00bb\u00bfOrder": "4",
        "Business_Unit": "30",
        "Sold_To": "4242",
        "Ship_To": "4242",
        "Customer_PO": "AT&T",
        "Quantity_Ordered": "3",
        "UoM": "M",
        "Item_Number": "210",
        "Extended_Price": "",
        "P4210_Version": "ZJDE0002"
    }
}

Is there a way to have it just return Order instead of \u00ef\u00bb\u00bfOrder? I am using a CSV made by saving as .csv in Excel. When I open the CSV in Sublime Text Editor I don't see any of the extra characters.

I can't figure out how to have it only return Order for the name : key pair.

martineau
  • 119,623
  • 25
  • 170
  • 301
Flip7607
  • 59
  • 1
  • 1
  • 6
  • You could change `"Order"` to `"Order"` before saving. just an FYI, `json.dump(jsonFile, indent=4)` should be enough unless it failed because of the encoding – Buckeye14Guy Jan 02 '20 at 19:39
  • what character set is your input .csv file in? If it is not simple ASCII you are going to have to either clean it up on input... – F1Rumors Jan 02 '20 at 19:40
  • Does this answer your question? [Python read csv - BOM embedded into the first key](https://stackoverflow.com/questions/40310042/python-read-csv-bom-embedded-into-the-first-key) – snwflk Jan 02 '20 at 19:42
  • For future reference: Sublime Text should show "UTF-8 with BOM" instead of "UTF-8" in the bottom right corner. The BOM is not displayed as characters, but you can see them if you open the file in a hex editor. – snwflk Jan 02 '20 at 19:45

1 Answers1

1

Based off the CSV variable name, this data is coming from Excel, meaning that the file was saved with the encoding utf-8-sig and has the BOM (Byte order mark) prepended. When you open a file in Python without specifying an encoding, it will assume an encoding (utf-8 in Python 3, ascii in Python 2) and interpret the BOM as any other bytes in the file.

To fix this, you just need to tell Python the proper encoding

Python 3:

...

with open(csvFilePath, encoding='utf-8-sig') as csvFile:

...

Python 2:

import codecs

...

with codecs.open(csvFilePath, encoding='utf-8-sig') as csvFile:

...
mmmkay
  • 696
  • 1
  • 7
  • 12