1

I need to save data from csv file to django models. The data comes from external api so I have no control on its structure. In my schema, I allowed all fields to be nullable.

This is my script

   text = f"{path}/report.csv"
        df = pd.read_csv(text)
        row_iter = df.iterrows()

        for index, row in row_iter:
            rows = {key.replace("-", "_"): row.pop(key) for key in row.keys()}
            # print(f"rows {rows}")
            # default_values = {
            #     "amazon-order-id",merchant-order-id,purchase-date,last-updated-date,order-status,fulfillment-channel,sales-channel,order-channel,ship-service-level,product-name,sku,asin,item-status,quantity,currency,item-price,item-tax,shipping-price,shipping-tax,gift-wrap-price,gift-wrap-tax,item-promotion-discount,ship-promotion-discount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,is-business-order,purchase-order-number,price-designation,is-iba,order-invoice-type
            # }

            sb, created = Order.objects.update_or_create(
                sellingpartner_customer=c,
                amazon_order_id=rows["amazon_order_id"],
                sku=rows["sku"],
                asin=rows["asin"],
                defaults={**rows},
            )

However, since some of the csv fields has empty values, pandas will replace it with NaN value, this is where django returns an error

   django.db.utils.OperationalError: (1054, "Unknown column 'NaN' in 'field list'")

I tried replace empty values as empty string("")

   df.fillna("", inplace=True)

But django will return an error for fields that are not charfields

   django.core.exceptions.ValidationError: ['“” value must be a decimal number.']

My question is, how do you handle empty values from csv file in pandas so for example if the field type is boolean, pandas will just replace empty value with boolean False for django boolean type fields, 0 for empty decimal types, and just blank for empty charfields, etc ?

ira
  • 534
  • 1
  • 5
  • 17
  • I'm usually going the other way around (getting stuff out of Django into Pandas)... but if you replace your `nan` fields with `None`, does Django accept them for the nullable fields? – ifly6 Apr 13 '22 at 16:21
  • Sure, I'll formally answer. – ifly6 Apr 13 '22 at 16:45

1 Answers1

0

Nullable Django models won't take np.nan or other Pandas-compatible not-a-number objects. It expects taking None as in stock Python. When you have nan values, before you save them to Django, just replace them with None to avoid the validation error.


Edit. The straight-forward implementation of replacing your np.nan objects with None would be:

df.replace(np.nan, None)

This does not act "in-place". Pass inplace=True as another parameter or otherwise re-assign the output to df.

ifly6
  • 5,003
  • 2
  • 24
  • 47