Here is the CSV file that I am working with:
`"A","B","C","D","E","F","G","H","I","J"
"88",18,1,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-10 12.00.02 AM,27-JUN-15 12.00.00 AM,"26",667,0
"22",22,1,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-22 12.00.02 AM,27-JUN-22 12.00.00 AM,"26",667,0
"32",22,1,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-20 12.00.02 AM,27-JUN-34 12.00.00 AM,"26",667,0`
The below function is annotated. Briefly,the function get_clientresponses_two
reads the above CSV, selects column E's data instances (XML data).There are two two generator functions to parse the XML data in **column E **in order to convert the XML tags and their text into a Python dictionary. Specifically, the flatten_dict()
function returns an iterable sequence of (key, value) pairs. One can turn this to a list of pairs with list(flatten_dict(root))
.
The output, as it is written so far is generate a dictionary. Then, def allocate_and_write_data_
then takes those and creates two different collections. One is a set that is updated using the keys from flatten_dict(
) . This is to ensure that the element tags from the XML are included in the headers (along with their corresponding values) in the newly written CSV. The code is written such to maintain the integrity of the headers (no duplicates) and allow for new element tags to be converted into headers (along with their values). Moreover, the headers and values that already exist should be flexible enough to be updated with new instances (again - unique, as well). In addition all the other rows are intended to be stored and updated. I then convert the headers into a list and ensure that any missing data instances are accounted for (with a ' ') using the list comprehension data
import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
import collections
from __future__ import print_function
def get_clientresponses_2(filename = 's.csv'):
with open(filename, 'rU') as infile:
reader = csv.DictReader(infile) # read the file as a dictionary for each row ({header : value})
data = {}
for row in reader:
for header, value in row.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
client_responses = data['E'] #returns a list
for client_response in client_responses:
xml_string = (''.join(client_response))
xml_string = xml_string.replace('&', '')
try:
root = ElementTree.XML(xml_string)
print(root)
except ET.ParseError:
print("catastrophic failure")
continue
def allocate_and_write_2(get_clientresponses_2_gen):
with open(filename, 'r') as infile:
reader = csv.DictReader(infile) # read the file as a dictionary for each row ({header : value})
header = set()
results = []
# data = {} # this is not needed for the purpose of this organization
for row in reader:
for get_clientresponses_2 in get_clientresponses_2_gen:
xml_data = get_clientresponses_2()
row.update(xml_data) # just for XML data
results.append(row) # everything else
header.update(row.keys()) # can't forget headers
# print(row) # returns dictionary of key values pairs (headers : values)
# print(results) # returns list wrapper for dictionary
# print(headers) #returns set of all headers
headers_list = list(header)
# print(headers_list) #list form of set
with open('csv_output.csv', 'wt') as f:
writer = csv.writer(f)
writer.writerow(headers_list)
for row in results:
data = [row.get(x, '') for x in headers_list]
writer.writerow(data)
# writer.writerows(zip(headers_list, data))
The output is the following:
C,HPhone,Locator,IISO,E,S,FName,LaName,J,D,MemID,ResponseRequestType,T,Email,I,Ob,G,MemPass,Address,A,PrivateMembers,H,CNum,ResponseT,CID,B,F
1,,,,"<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-15 12.00.00 AM,,,88,,26,[REMOVED],10,,18,0-JAN-10 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-22 12.00.00 AM,,,22,,26,[REMOVED],10,,22,0-JAN-22 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-34 12.00.00 AM,,,32,,26,[REMOVED],10,,22,0-JAN-20 12.00.02 AM
However, I receive the following error when I have tried to call the 'get_clientresponses_two'
within the 'allocate_and_write'
:
<ipython-input-91-cfd866a1c0b6> in allocate_and_write_2(get_clientresponses_2_gen)
37 # data = {} # this is not needed for the purpose of this organization
38 for row in reader:
---> 39 for get_clientresponses_2 in get_clientresponses_2_gen:
40 xml_data = get_clientresponses_2()
41 row.update(xml_data) # just for XML data
TypeError: 'function' object is not iterable
Based on my understanding of generators and other posts on this forum I know it is due this problem. I would like to iterate through the generators output by passing in the first function's output, get_clientresponses_two
's output while simultaneously implementing the other function. I would like guidance and feedback with specifically how to best rectify this.