0

NEWBIE USING PYTHON (2.7.9)- When I export a gzipped file to a csv using:

myData = gzip.open('file.gz.DONE', 'rb') 
myFile = open('output.csv', 'wb') with myFile:
        writer = csv.writer(myFile)
        writer.writerows(myData)    
print("Writing complete")

It is printing in the csv with a comma deliminated in every character. eg.

S,V,R,","2,1,4,0,",",2,0,1,6,1,1,3,8,0,4,",",5,0,5,0,1,3,4,2,0,6,4,7,3,6,4,",",",",2,0,0,0,5,6,5,9,2,9,6,7,4,",",2,0,0,7,2,4,5,2,3,5,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,2,1,4,4,9,3,7,0,",":,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","
"
S,V,R,",",4,7,3,3,5,5,",",2,0,5,7,",",5,0,5,0,1,4,5,0,1,6,4,8,6,3,7,",",",",2,0,0,0,5,5,3,9,2,9,2,8,0,",",2,0,4,4,1,0,8,3,7,8,",",0,0,0,2,","
I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,4,7,3,3,5,4,5,5,",",,:,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",",","

How do I get rid of the comma so that it is exported with the correct fields? eg.

SVR,2144370,20161804,50501342364,,565929674,2007245235,0002,1,PPDAP,PPLUS,DEACTIVE,,,EN,N/A,214370,:IR_,N/A,,,,, SVR,473455,208082557,14501648637,,2000553929280,2044108378,0002,1,3G,CODAP,INACTIVE,,,EN,N/A,35455,:IR_,N/A,,,,,

pippy5
  • 11
  • 6
  • 1
    Sounds like a type mismatch. Maybe writerows() expects an iterable but myData is actually a string? – mr nick Dec 19 '17 at 01:31
  • Thanks mr nick but that did not work writer.writerows() needs to take exactly one argument – pippy5 Dec 19 '17 at 01:45
  • what if you pass `myData` as a `list` (e.g. `writer.writerows([myData])`) as suggested here: [Why does csvwriter.writerow() put a comma after each character?](https://stackoverflow.com/a/1816897/1248974) – chickity china chinese chicken Dec 19 '17 at 01:47
  • Nope- tried both "writer.writerows[myData]" and "writer.writerows([myData])" – pippy5 Dec 19 '17 at 02:29

3 Answers3

0

You are only opening the gzip file. I think you are expecting the opened file to act automatically like an iterator. Which it does. However each line is a text string. The writerows expects an iterator with each item being an array of values to write with comma separation. Thus given an iterator with each item being a sting, and given that a string is an array of characters you get the result you found.

Since you didn't mention what the gzip data lines really contain I can't guess how to parse the lines into an array of reasonable chunks. But assuming a function called 'split_line' appropriate to that data you could do

with gzip.open('file.gz.Done', 'rb') as gzip_f:
  data = [split_line(l) for l in gzip_f]
  with open('output.csv', 'wb') as myFile:
    writer = csv.writer(myFile)
    writer.writerows(data)
    print("Writing complete")

Of course at this point doing row by row and putting the with lines together makes sense.

See https://docs.python.org/2/library/csv.html

Samantha Atkins
  • 658
  • 4
  • 12
  • Thanks Samantha. The gzip data contain 14,000 lines which each contain 21 comma seperated fields... eg. SVR,370,2011143804,05047364,,2056599674,200724525,0002,1,G,PPPCODAP,3LUS,DIVE,,,EN,N/A,0421449370,DS:IR_SMS,N/A,UW,,,, Your suggestion came back with "NameError: name 'split_line' is not defined" ?? – pippy5 Dec 19 '17 at 02:50
  • 1
    Of course it did. I said that "assuming a function called split_line..". If your gzip data is already comma separated then why not just write it out directly instead of fooling around with the csv writer? – Samantha Atkins Dec 19 '17 at 02:53
  • csv means comma separated values. It is what you already have. See new answer below. – Samantha Atkins Dec 19 '17 at 03:03
0

I think it's simply because gzip.open() will give you a file-like object but csvwriter.writerows() needs a list of lists of strings to do its work.

But I don't understand why you want to use the csv module. You look like you only want to extract the content of the gzip file and save it in a output file uncompressed. You could do that like that:

import gzip

input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'

with gzip.open(input_file_name, 'rt') as input_file:
    with open('output.csv', 'wt') as output_file:
        for line in input_file:
            output_file.write(line)

print("Writing complete")

If you want to use the csv module because you're not sure your input data is properly formatted (and you want an error message right away) you could then do:

import gzip
import csv

input_file_name = 'file.gz.DONE'
output_file_name = 'output.csv'

with gzip.open(input_file_name, 'rt', newline='') as input_file:
    reader_csv = csv.reader(input_file)
    with open('output.csv', 'wt', newline='') as output_file:
        writer_csv = csv.writer(output_file)
        writer_csv.writerows(reader_csv)

print("Writing complete")

Is that what you were trying to do ? It's difficult to guess because we don't have the input file to understand.

If it's not what you want, could you care to clarify what you want?

EvensF
  • 1,479
  • 1
  • 10
  • 17
0

Since I now have information the gzipped file is itself comma, separated values it simplifies thus..

with gzip.open('file.gz.DONE', 'rb') as gzip_f, open('output.csv', 'wb') as myFile:
  myfile.write(gzip_f.read())

In other words it is just a round about gunzip to another file.

Samantha Atkins
  • 658
  • 4
  • 12