1

I am having quite the issue a program. I am working with the woocommere api to export daily orders from a website so I can automate the entry of them to our internal order system. The goal is to take csv that is produced from the api and format the columns in an easy way to grab with an rpg file that is already written. My issue is that in column R & S have nested dictionaries in them and I haven't been able to access those keys in the cell and then append them onto the csv. Hope this makes sense and someone can help. I can provide more clarification if necessary.

'''

import csv

#open a new input file
with open('buckorders.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    #open new output file
    with open('buck_reformatted_orders.csv', 'w') as new_file:
        #create column headers
        fieldnames = ['id', 'status', 'date_created', 'total', 'line_items', 'billing', 'shipping', 'payment_method', 'payment_method_title']
        csv_writer = csv.DictWriter(new_file, fieldnames=fieldnames)

        #write out the fieldnames
        csv_writer.writeheader()

        for line in csv_reader:

            billing_list = line['billing']
            print(type(billing_list))
            print(billing_list)

            #delete all unneeded columns
            del line['meta_data']
            del line['refunds']
            del line['shipping_tax']
            del line['order_key']
            del line['date_completed_gmt']
            del line['discount_total']
            del line['date_paid_gmt']
            del line['prices_include_tax']
            del line['tax_lines']
            del line['parent_id']
            del line['currency_symbol']
            del line['created_via']
            del line['_links']
            del line['date_modified']
            del line['discount_tax']
            del line['total_tax']
            del line['version']
            del line['date_paid']
            del line['customer_ip_address']
            del line['cart_hash']
            del line['customer_user_agent']
            del line['transaction_id']
            del line['date_completed']
            del line['fee_lines']
            del line['cart_tax']
            del line['currency']
            del line['customer_id']
            del line['customer_note']
            del line['shipping_total']
            del line['date_modified_gmt']
            del line['date_created_gmt']
            del line['shipping_lines']
            del line['coupon_lines']
            del line['number']

            csv_writer.writerow(line)

''' Link to google sheet for csv: buckorders.csv

martineau
  • 119,623
  • 25
  • 170
  • 301

1 Answers1

1

Good question, and really good work so far

I see the big issue of trying to parse the nested dicts inside the billing and shipping columns. I also see that a restructure of your process will help you, especially with all those delete line[col_name] lines.

Instead of starting with the row from your reader and trying to strip away columns you don't want and add new columns, I recommend "building up" to your final row from nothing, only adding the columns you do want. This is more than just a suggestion, it's almost a necessity if you're going to use the DictWriter and add columns from the nested dicts on the fly.

I'll address the nested dicts further down, for now I want to give you intro to the bigger idea:

import csv

# Keys from DictReader to keep; omitted billing and shipping, don't copy those columns as they are
fieldnames = ['id', 'status', 'date_created', 'total',
              'line_items', 'payment_method', 'payment_method_title']

# Accumulate your built-up rows
new_rows = []

with open('buckorders.csv', newline='') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        new_row = {k: row[k] for k in fieldnames}  # build the base for your new row

        # Handle nested dicts here... adding to that base row

        new_rows.append(new_row)


with open('buck_reformatted_orders.csv', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=new_rows[0].keys())
    writer.writeheader()
    writer.writerows(new_rows)

Run that, and you'll get the input CSV minus all the columns you don't want. When I run it, I get these columns:

1: id
2: status
3: date_created
4: total
5: line_items
6: payment_method
7: payment_method_title

The key, for me, is not using pre-defined fieldnames for the fieldnames= parameter when creating the DictWriter. You're going to be adding a lot of fields from those nested dicts, and it would be a lot of work to try and pre-define all of those and pass those to the DictWriter up front.

As for the nested dicts themselves... I downloaded your sample CSV from Google Sheets, and if I'm seeing exactly what you're seeing then you might have some problems, but I cannot say for sure.

What I'm seeing is a Python dict represented literally as a string; or I'm seeing JSON but with all the double quotes converted to single quotes.

You called it "a nested dict" and it's easier to think of it being a nested dict, so I'll go that route...

I checked this SO, Convert a String representation of a Dictionary to a dictionary? for reference and I suggest you do the same (and remember to upvote if it's helpful!). As suggested in that helpful post, I'm going to use ast.literal_eval to convert each cell back to a Python dict.

From there, I can use the standard .items() method on the dict to iterate over a pair of keys and values (k, v). I build a new column name by prefixing the key with shipping_, and add the new col_name and its value to the new row:

...
for row in csv_reader:
    new_row = {k: row[k] for k in fieldnames}

    fixed_shipping = row['shipping'].replace("'", '"')
    shipping_dict = json.loads(fixed_shipping)
    for k,v in shipping_dict.items():
        col_name = f'shipping_{k}'
        new_row[col_name] = v
    
    ...
...

Everything else remains the same.

When I run that against your input, I get these headers in my reformatted CSV:

1: id
2: status
3: date_created
4: total
5: line_items
6: billing
7: shipping
8: payment_method
9: payment_method_title
10: shipping_first_name
11: shipping_last_name
12: shipping_company
13: shipping_address_1
14: shipping_address_2
15: shipping_city
16: shipping_state
17: shipping_postcode
18: shipping_country
19: shipping_phone

I'll leave duplicating that for billing, and adding both the row iterator of the csv_reader, to you.

Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • Zach, thank you very much for taking the time to explain a more efficient way to do this. I can see that writing the fieldnames to the new csv file works and that I can access the dictionary keys in that final for loop. I will start working on running through the shipping and billing column dictionaries and assigning those values to new columns. If I get stuck again, I will reach out if you are available. Again, thank you so much, I greatly appreciate it. – John Pattwell Jan 19 '22 at 15:57