Python write XML data from API to SQL Server (using parse to .csv as intermediate hop)

Question

python novice here.

Trying to pull data from API and insert into SQL Server table (existing) for use in BI tool.

Original result from API is XML with a terribly unfriendly-for-a-novice format.

I have managed to parse this (likely not in the most pythonic way and open to suggestions) to .csv format (3 separate files due to nature of nested XML). Now that I have these in .csv I am trying to write them to my SQL-Server tables, one table per .csv but have hit a snag. I am using the code from this answer and all seems okay other than the leading comma that gets created in the column names section of the query. Anyone help me remove that leading comma?

This is the code as I have written currently:

import json
import requests
import pandas as pd
import csv
from pandas.io.json import json_normalize
from datetime import date, timedelta

url = "https://**myAPI_URL.com/Transaction"

paramHeader = '{"Version": "1.0"'
paramHeader += ', "FromDate":"2020-05-01 00:00"'
paramHeader += ', "ToDate": "2020-05-31 00:00"'
paramHeader += ', "MerchantOrgID": null'
paramHeader += ', "CardholderOrgID": null'
paramHeader += ', "CardNumber": null'
paramHeader += ', "DriverID": null'
paramHeader += ', "VehicleID": null'
paramHeader += ', "BillingGroupID": null'
paramHeader += ', "BillingCycleID": null'
paramHeader += ', "EntryMethodID": null'
paramHeader += ', "CardTypeID": null'
paramHeader += ', "TranTypeID": null'
paramHeader += ', "TaxExemptOnly": null}'

headers = {'APIKey': '**myAPI_KEY**'
    , 'content-type': 'application/json'
    , 'Accept': 'application/json'
    , 'parameters': paramHeader}

response = requests.get(url, data='', headers=headers)
if response.status_code == 200:
    r = json.loads(response.content.decode('utf-8'))

    cleanData = pd.json_normalize(r)

    transactionDetails = pd.json_normalize(data=r, record_path='Details', meta=['ID'])

    taxes = pd.json_normalize(data=r, record_path=['Details', 'Taxes'],
                                 meta=['ID'])

    cleanData.to_csv('**filePath**/mainTransactions.csv')
    transactionDetails.to_csv('**filePath**/transactionsDetails.csv')
    taxes.to_csv('**filePath/transactionsTaxes.csv')

    import pyodbc
    connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=**serverIP**;PORT=1433;DATABASE=myDBName;UID=myUserID;PWD=myPWord;')

    with open('**filePath**/transactionsDetails.csv', 'r') as f:
        reader = csv.reader(f)
        columns = next(reader)
        query = 'insert into transactionDetails({0}) values ({1})'
        query = query.format(','.join(columns), ','.join('?' * (int(len(columns)))))
        print(query)   #for debug purposes
        cursor = connection.cursor()
        for data in reader:
            cursor.execute(query, data)
        cursor.commit()

This code results in the following error:

insert into transactionDetails(,RowNumber,RawProductCode,RawUnitPrice,RawAmount,Quantity,ResolvedProductID,ProductCategoryID,ProductCategory,IsFuel,ProductName,ProductCode,IsTaxProduct,ResolvedUnitPrice,ResolvedAmount,Taxes,ID) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
Traceback (most recent call last):
  File "**workingDirectory**/myProject.py", line 85, in <module>
    cursor.execute(query, data)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near ','. (102) (SQLExecDirectW)")

Process finished with exit code 1

Manually executing the same query (using all 1's as test data) after removing the leading comma results in a successful write to DB.

myTableName> insert into transactionDetails(RowNumber,RawProductCode,RawUnitPrice,RawAmount,Quantity,ResolvedProductID,ProductCategoryID,ProductCategory,IsFuel,ProductName,ProductCode,IsTaxProduct,ResolvedUnitPrice,ResolvedAmount,Taxes,ID) values (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
[2020-05-28 20:08:55] 1 row affected in 188 ms

Thanks!

It looks like you have an extra comma before your `RowNumber` compared to your "all 1's" example. Is that a mistake or is that the error? — Joseph Glover, May 29 '20 at 01:03
I see that that was your actual question, how to get rid of that comma. The end of your question makes it seem like you were having an issue with the data, not the prepended comma. Try to end your question with a question! — Joseph Glover, May 29 '20 at 01:25

Joseph Glover · Accepted Answer · 2020-06-18T00:50:19.627

The leading comma is just inserted because the first element from your columns resolves to an empty string somehow. If this is consistent you can just account for it by slicing the columns:

# Just take the slice starting from the 1st element
# Also, no need to use int(len()), len() already returns an integer.
query = query.format(','.join(columns[1:]), ','.join('?' * len(columns[1:])))

An even simpler way to perform the above is to first perform the slice when getting the columns for the first time.

columns = next(reader)[1:]
query = 'insert into transactionDetails({0}) values ({0})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))

Some additional Python tips

Don't structure your paramHeader by concatenating a bunch of strings. It's very messy and dangerous (prone to typos). The content is supposed to be a json object, so simply make a dictionary and use the json module to output a properly formatted json object:

>>> import json
>>> my_json = {
... "this": 123,
... "that": "string"
... }
>>> json.dumps(my_json)
'{"this": 123, "that": "string"}'

dumps stands for "dump string".

Thank you sir. I'll keep this in mind, much cleaner. I was more focused on the data being returned but this will definitely make future modifications to the call easier and cleaner. — Ben Wilkens, Jun 16 '20 at 20:09

score 0 · Answer 2 · answered May 29 '20 at 02:09

This was resolved by removing the unnamed column from the .csv that my pandas df created using the snippet found here :

with open('**filePath**/transactionsDetails.csv', 'r') as source:
            rdr = csv.reader(source)
            with open('**filePath**/transactionsDetails2.csv', 'w') as result:
                wtr = csv.writer(result)
                for r in rdr:
                    wtr.writerow((r[1], r[3], r[4],r[5],r[6],r[7],r[8],r[9],r[10],r[11],r[12],r[13],r[14],r[15],r[16]))

Full working code below:

import json
import requests
import pandas as pd
import csv
from pandas.io.json import json_normalize
from datetime import date, timedelta

url = "https://**myAPI**.com/Transaction"

paramHeader = '{"Version": "1.0"'
paramHeader += ', "FromDate":"2020-05-01 00:00"'
paramHeader += ', "ToDate": "2020-05-31 00:00"'
paramHeader += ', "MerchantOrgID": null'
paramHeader += ', "CardholderOrgID": null'
paramHeader += ', "CardNumber": null'
paramHeader += ', "DriverID": null'
paramHeader += ', "VehicleID": null'
paramHeader += ', "BillingGroupID": null'
paramHeader += ', "BillingCycleID": null'
paramHeader += ', "EntryMethodID": null'
paramHeader += ', "CardTypeID": null'
paramHeader += ', "TranTypeID": null'
paramHeader += ', "TaxExemptOnly": null}'

headers = {'APIKey': '**myAPIKey**'
    , 'content-type': 'application/json'
    , 'Accept': 'application/json'
    , 'parameters': paramHeader}

response = requests.get(url, data='', headers=headers)

if response.status_code == 200:
    r = json.loads(response.content.decode('utf-8'))

    cleanData = pd.json_normalize(r)
    transactionDetails = pd.json_normalize(data=r, record_path='Details', meta=['ID'])
    #print(transactionDetails)
    taxes = pd.json_normalize(data=r, record_path=['Details', 'Taxes'],
                                 meta=['ID'])

    cleanData.to_csv('**filePath**/mainTransactions.csv')
    transactionDetails.to_csv('**filePath**/transactionsDetails.csv')
    taxes.to_csv('**filePath**/transactionsTaxes.csv')

    import pyodbc
    connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=**serverIP**;PORT=1433;DATABASE=myDBName;UID=myUsername;PWD=myPword;')

    with open('**filePath**/transactionsDetails.csv', 'r') as source:
        rdr = csv.reader(source)
        with open('**filePath**/transactionsDetails2.csv', 'w') as result:
            wtr = csv.writer(result)
            for r in rdr:
                wtr.writerow((r[1], r[3], r[4],r[5],r[6],r[7],r[8],r[9],r[10],r[11],r[12],r[13],r[14],r[15],r[16]))

    with open('**filePath**/transactionsDetails2.csv', 'r') as f:
        reader = csv.reader(f)
        print(reader)
        columns = next(reader)
        query = 'insert into transactionDetails({0}) values ({1})'
        query = query.format(','.join(columns), ','.join('?' * (int(len(columns)))))
        print(query)
        cursor = connection.cursor()
        for data in reader:
            cursor.execute(query, data)
        cursor.commit()
    connection.close()

I'm curious if anyone has a cleaner way to do this?

Python write XML data from API to SQL Server (using parse to .csv as intermediate hop)

2 Answers2

Some additional Python tips