0

Here is the CSV file that I am working with:

`"A","B","C","D","E","F","G","H","I","J"

"88",18,1,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-10 12.00.02 AM,27-JUN-15 12.00.00 AM,"26",667,0
"22",22,1,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-22 12.00.02 AM,27-JUN-22 12.00.00 AM,"26",667,0
"32",22,1,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-20 12.00.02 AM,27-JUN-34 12.00.00 AM,"26",667,0`

The below function is annotated. Briefly,the function get_clientresponses_two reads the above CSV, selects column E's data instances (XML data).There are two two generator functions to parse the XML data in **column E **in order to convert the XML tags and their text into a Python dictionary. Specifically, the flatten_dict() function returns an iterable sequence of (key, value) pairs. One can turn this to a list of pairs with list(flatten_dict(root)).

The output, as it is written so far is generate a dictionary. Then, def allocate_and_write_data_ then takes those and creates two different collections. One is a set that is updated using the keys from flatten_dict() . This is to ensure that the element tags from the XML are included in the headers (along with their corresponding values) in the newly written CSV. The code is written such to maintain the integrity of the headers (no duplicates) and allow for new element tags to be converted into headers (along with their values). Moreover, the headers and values that already exist should be flexible enough to be updated with new instances (again - unique, as well). In addition all the other rows are intended to be stored and updated. I then convert the headers into a list and ensure that any missing data instances are accounted for (with a ' ') using the list comprehension data

import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
import collections 
from __future__ import print_function

def get_clientresponses_2(filename = 's.csv'):

    with open(filename, 'rU') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        data = {}
        for row in reader:
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]

        client_responses = data['E'] #returns a list
        for client_response in client_responses:
            xml_string = (''.join(client_response))
            xml_string = xml_string.replace('&amp;', '')
            try:
                root = ElementTree.XML(xml_string)
                print(root) 
            except ET.ParseError:
                print("catastrophic failure")
                continue

def allocate_and_write_2(get_clientresponses_2_gen):

    with open(filename, 'r') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        header = set()
        results = []
        #     data = {} # this is not needed for the purpose of this organization
        for row in reader:
            for get_clientresponses_2 in get_clientresponses_2_gen:
                xml_data = get_clientresponses_2()
                row.update(xml_data)        # just for XML data
                results.append(row)         # everything else
                header.update(row.keys())  # can't forget headers

    #     print(row) # returns dictionary of key values pairs (headers : values)
    #     print(results) # returns list wrapper for dictionary
    #     print(headers) #returns set of all headers
        headers_list = list(header)
    #     print(headers_list) #list form of set

        with open('csv_output.csv', 'wt') as f:
            writer = csv.writer(f)
            writer.writerow(headers_list)
            for row in results:
                data = [row.get(x, '') for x in headers_list]
                writer.writerow(data)
    #             writer.writerows(zip(headers_list, data))

The output is the following:

C,HPhone,Locator,IISO,E,S,FName,LaName,J,D,MemID,ResponseRequestType,T,Email,I,Ob,G,MemPass,Address,A,PrivateMembers,H,CNum,ResponseT,CID,B,F
1,,,,"<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-15 12.00.00 AM,,,88,,26,[REMOVED],10,,18,0-JAN-10 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-22 12.00.00 AM,,,22,,26,[REMOVED],10,,22,0-JAN-22 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-34 12.00.00 AM,,,32,,26,[REMOVED],10,,22,0-JAN-20 12.00.02 AM

However, I receive the following error when I have tried to call the 'get_clientresponses_two' within the 'allocate_and_write':

<ipython-input-91-cfd866a1c0b6> in allocate_and_write_2(get_clientresponses_2_gen)
     37         #     data = {} # this is not needed for the purpose of this organization
     38         for row in reader:
---> 39             for get_clientresponses_2 in get_clientresponses_2_gen:
     40                 xml_data = get_clientresponses_2()
     41                 row.update(xml_data)        # just for XML data

TypeError: 'function' object is not iterable

Based on my understanding of generators and other posts on this forum I know it is due this problem. I would like to iterate through the generators output by passing in the first function's output, get_clientresponses_two's output while simultaneously implementing the other function. I would like guidance and feedback with specifically how to best rectify this.

ahlusar1989
  • 349
  • 5
  • 16
  • This is very to follow, I think. All the details about CSV and XML seem irrelevant. Can you show reduce this to a minimal example showing exactly what the problem is? At the very least, please show the code where the error is occurring. Also, it would be helpful to alway refer to the functions by their names. I don't know what you mean by "the first function," the "other function," and so on. – saulspatz Aug 24 '15 at 18:29
  • @saulpatz I have made edits, as per your latter request. However, I believe that it is important to explain the codes functional purpose along - with the input and output (in order to clarify the purpose of the specific collections and goal. If you feel that there are other more clarifying edits to be made, feel please free to do so. This is an open forum. – ahlusar1989 Aug 24 '15 at 18:37
  • @ahlusar1989 Can you please post a valid code, the above does not seem to be a working code, lots of syntax errors. Like - `return dict(flatten_dict(root) ####retu` – Anand S Kumar Aug 24 '15 at 18:51
  • @ahlusar1989 Anand beat me to it by a second. I was going to ask about the very same line. Also, I think the error is probably in the code for get_client_responses, not in the call, so please post the traceback. – saulspatz Aug 24 '15 at 18:57
  • @AnandSKumar Changes made – ahlusar1989 Aug 24 '15 at 19:11
  • The error message says you are trying to iterate over a function object. It would be more helpful to include the actual traceback with the error message. – dsh Aug 24 '15 at 19:20
  • How are you calling the function - what is `get_clientresponses_2_gen` ? – Anand S Kumar Aug 24 '15 at 19:24
  • @dsh That is the traceback. I am using Jupyter notebook as my IDE – ahlusar1989 Aug 24 '15 at 19:35
  • @AnandSKumar It is to account for that the generator function that is returned needs to be iterated through - at least to my understanding. – ahlusar1989 Aug 24 '15 at 19:36

2 Answers2

0

According to comments, it seems like get_clientresponses_2_gen is a generator function, if so , then this is not how you iterate through generator functions -

for get_clientresponses_2 in get_clientresponses_2_gen:
    xml_data = get_clientresponses_2()

To iterate through generator function and put each value in xml_data , use -

for xml_data in get_clientresponses_2_gen():

(Replace this with the two lines given above).

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
0

Thanks to @AnandSKumar for his guidance:

It was indeed due to how I was utilizing the iterator constructor in the context of the generator function. I substituted my original script with Anand's suggestion:

for xml_data in get_clientresponses_2():
     xml_dat =  dict(flatten_dict(xml_data))

However, I also had to modify get_clientresponses_2 by returning the root of each XML tree and pass it to allocate_and_write()

I kept them as two mutually exclusive functions in order to prevent any side effects.

You need to simply call allocate_and_write() within

if __name__ == "__main__":
    main()

My reasons for doing so are detailed here

Below are both of the 2 function suites:

import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
import collections 
from __future__ import print_function


def get_clientresponses_2(filename = 's.csv'):

    with open(filename, 'rU') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        data = {}
        for row in reader:
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]

        client_responses = data['E'] #returns a list
        for client_response in client_responses:
            xml_string = (''.join(client_response))
            xml_string = xml_string.replace('&amp;', '')
            try:
                root = ElementTree.XML(xml_string)
#                 print(root)
                return root
            except ET.ParseError:
                print("catastrophic failure")
                continue

def allocate_and_write_2():

    with open(filename, 'r') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        header = set()
        results = []
        #     data = {} # this is not needed for the purpose of this organization
        for row in reader:
            for xml_data in get_clientresponses_2():
                xml_dat =  dict(flatten_dict(xml_data))
                row.update(xml_dat)        # just for XML data
                results.append(row)         # everything else
                header.update(row.keys())  # can't forget headers

        #     print(row) # returns dictionary of key values pairs (headers : values)
        #     print(results) # returns list wrapper for dictionary
        #     print(headers) #returns set of all headers
            headers_list = list(header)
        #     print(headers_list) #list form of set

            with open('csv_output.csv', 'wt') as f:
                writer = csv.writer(f)
                writer.writerow(headers_list)
                for row in results:
                    data = [row.get(x, '') for x in headers_list]
#                     print(data)
                    writer.writerow(data)
    #             writer.writerows(zip(headers_list, data))
Community
  • 1
  • 1
ahlusar1989
  • 349
  • 5
  • 16