0

I have multiple archives which contain multiple .tsv files. What I'm trying to do is extract some of these files and make a new .tsv file that merges/concatenates the files I've chosen. I'm stuck though, the only solution that I came up with merges them but with writing /t instead of tabbing properly. This is what I've tried so far:

    for numbers in list:
       file_name = numbers
       zip_ref = zipfile.ZipFile(archive_name, 'r')
       file_to_concat = zip_ref.read(file_name)
       model_file.write(str(file_to_concat))

At the end, in the model_file I will have text full of '\t' and no tabs. I guess I should use 'import csv' somewhere ? I have no idea though how to do it

Gimv13
  • 153
  • 2
  • 3
  • 15
  • You might like the pandas library for this. – Aaron Oct 09 '17 at 22:20
  • What do you mean y "tabbing properly"? `\t` is [a proper tab](https://stackoverflow.com/questions/4488570/how-do-i-write-a-tab-in-python) – Peter Wood Oct 10 '17 at 06:37
  • what I mean is that it writes "\t" in the file instead of tabbing – Gimv13 Oct 10 '17 at 07:24
  • So you want to open all archive files in a folder (what extensions?) and for each archive file, open it, extract all the `.tsv` files and output a single file with the contents of the `.tsv` files merged together? – Martin Evans Oct 12 '17 at 16:45

1 Answers1

0

The following will take all .zip files in a folder, extract the files one by one as TSV files and write them to an combined output TSV file based on the same name as the .zip file:

import zipfile
import csv
import glob

for zip_filename in glob.glob('*.zip'):
    csv_filename = "{}.csv".format(os.path.splitext(os.path.basename(zip_filename))[0])
    print "{} -> {}".format(zip_filename, csv_filename)

    with zipfile.ZipFile(zip_filename, 'r') as zip_ref, open(csv_filename, 'wb') as f_csv:
        csv_writer = csv.writer(f_csv, delimiter='\t')

        for zip_member in zip_ref.namelist():
            print "  {}".format(zip_member)
            with zip_ref.open(zip_member) as f_zip:
                csv_writer.writerows(csv.reader(f_zip, delimiter='\t'))
Martin Evans
  • 45,791
  • 17
  • 81
  • 97