0

I have a file with several dictionaries. Below is just part of the file. I need to convert this into a csv file and eventually load it into a database. I am having issues converting it to csv.

{"transaction_type": "new", "policynum": 4994949}
{"transaction_type": "renewal", "policynum": 3848848}
{"transaction_type": "cancel", "policynum": 49494949,  "cancel_table": 
[{"cancel_cd": "AU", "cancel_type": "online"}, {"cancel_cd": "AA", "cancel_type": "online"}]}

I have tried to implement the below code but the cancel table key is not parsed in to the csv file properly.

import ast

import csv

with open('***\\Python\\test', 'r') as in_f, open('***\\Python\\test.csv', 'w') as out_f:

    data = in_f.readlines()


    writer = csv.DictWriter(out_f, fieldnames=['transaction_type', 'policynum', 'cancel_table'], extrasaction='ignore')
    writer.writeheader()  # For writing header

    for row in data:
        dict_row = ast.literal_eval(row)  # because row has a dict string
        writer.writerow(dict_row)

Below is the result I am getting with cancel table key not parsed in to the csv file properly. I need help getting cancel_type and different cancel_cd's as separate columns. Or cancel_cd's concatenated in one column with comma delimiter (just a thought). Sorry if it's a loaded question.

transaction_type,policynum,cancel_table
new,4994949,
old,3848848,
cancel,49494949,"[{'cancel_type': 'online','cancel_cd': 'OL'}, 'cancel_type': 'Online','cancel_cd': 'BR'},{'cancel_type': 'online','cancel_cd': 'AU', }]"
Jonathan Feenstra
  • 2,534
  • 1
  • 15
  • 22
freetoz
  • 27
  • 5

1 Answers1

0

Assuming the rows in cancel_table always contain both cancel_cd and cancel_type, to get cancel_cds and cancel_types as separate columns you can use the following code:

import ast
import csv

with open('Python/test', 'r') as in_f, open('Python/test.csv', 'w') as out_f:
    data = in_f.readlines()
    writer = csv.DictWriter(
        out_f,
        fieldnames=[
            'transaction_type', 'policynum', 'cancel_cds', 'cancel_types'
        ],
        extrasaction='ignore')
    writer.writeheader()

    for row in data:
        dict_row = ast.literal_eval(row)
        if 'cancel_table' in dict_row:
            cancel_table = dict_row['cancel_table']
            cancel_cds, cancel_types = [], []
            for cancel_row in cancel_table:
                cancel_cds.append(cancel_row['cancel_cd'])
                cancel_types.append(cancel_row['cancel_type'])
            dict_row['cancel_cds'] = ','.join(cancel_cds)
            dict_row['cancel_types'] = ','.join(cancel_types)
        writer.writerow(dict_row)

Make sure you are not using a comma as column separator for your csv, otherwise this will result in different columns for every value of cancel_cd and cancel_type.

Jonathan Feenstra
  • 2,534
  • 1
  • 15
  • 22
  • 1
    Thanks Jonathan. Yes, cancel table always has cancel_cd and cancel_type. I set the delimiter to pipe(|). By any chance do you have a better idea to accommodate multiple cancel_cd in the csv file without concatenation ? Also, how can i get to a point where i can work with files using python. I understand its practice and experience. But do you have any tips ? – freetoz Oct 16 '19 at 21:45
  • If you know the maximum number of rows in `cancel_table`, you could reserve some columns to store those values. Otherwise you could store them in a separate file and link them to your primary key which I assume is `policynum`. For working with files in Python, I recommend reading documentation about related functions and file parsing libraries (for example [pandas](https://pandas.pydata.org/)). Learning more about string manipulation and different data structures (like lists, dicts, etc.) is also useful. – Jonathan Feenstra Oct 17 '19 at 08:12
  • I have new key in the input file "cancel_reason" which has string value with \n in the middle of string. This is breaking the string into new line when script is executed. Is there a way to ignore \n ? so the string value doesn't break. Also, the input file is having blank/empty and junkie lines which needs to be skipped. Can we code to filter the key "transaction_type" to certain types so i can escape empty and junkie lines ? Or if there' an other way? – freetoz Nov 04 '19 at 18:42
  • If with 'ignoring' you mean removing `\n` from the string, just use `dict_row['cancel_reason'].replace('\n', '')`. As for skipping blank lines, see ["how to skip blank line while reading CSV file using python"](https://stackoverflow.com/questions/18890688/how-to-skip-blank-line-while-reading-csv-file-using-python). If you have any more questions, please post them separately. – Jonathan Feenstra Nov 04 '19 at 21:01