1

I already have a CSV file created from a list using CSV writer. I want to append another list created through a for loop columnwise to a CSV file.

The first code to create a CSV file is as follows:

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    for row in zip(master_lst):
        writer.writerow(row)

I created the CSV file using the list master_lst and the output is as follows:

read
ACACCUGGGCUCUCCGGGUACC
ACGGCUACCUUCACUGCCACCC
AGGCAGUGUGGUUAGCUGGUUG

Then I create another list (ind_lst) through a for loop and the contents of the list has to be appended columnwise to the CSV file created in the previous step. I used the following code:

with open("output.csv", "ab") as f:
    writer = csv.writer(f)
    for row in zip(ind_lst):
        writer.writerow(row)

The output I obtained is as follows:

read
ACACCUGGGCUCUCCGGGUACC
ACGGCUACCUUCACUGCCACCC
AGGCAGUGUGGUUAGCUGGUUG
sample1
3
3
1
sample2
4
4
1

However I need the output columnwise as follows:

read                         sample1     sample2
ACACCUGGGCUCUCCGGGUACC         3            4
ACGGCUACCUUCACUGCCACCC         3            4
AGGCAGUGUGGUUAGCUGGUUG         1            1

I checked for solutions but I can find only solutions for appending row wise, but I need to append it columnwise: append new row to old csv file python

I used writer.writerows instead of writer.writerow but I get this error:

_csv.Error: sequence expected

The output is as follow:

read
ACACCUGGGCUCUCCGGGUACC
ACGGCUACCUUCACUGCCACCC
AGGCAGUGUGGUUAGCUGGUUG
s                        a   m   p  l  e 1

As you can see, it prints the first element of the list in each cell and terminates thereafter with an error. I am a beginner in python, so if anyone could help solve this issue that would be awesome.

EDIT:

The master_lst is created using the following code:

 infile= open(sys.argv[1], "r")
 lines = infile.readlines()[1:]
 master_lst = ["read"]
 for line in lines:
  line= line.strip().split(',')
  fourth_field = line [3]
  master_lst.append(fourth_field)

the ind_lst is created using the following code:

for file in files:
 ind_lst = []   
 if file.endswith('.fa'):
  first = file.split(".")
  first_field = first [0]
  ind_lst.append(first_field)
  fasta= open(file)
  individual_dict= {}
  for line in fasta:
   line= line.strip()
   if line == '':
    continue
   if line.startswith('>'):
    header = line.lstrip('>')
    individual_dict[header]= ''
   else:
    individual_dict[header] += line
 for i in master_lst[1:]:
   a = 0
   if key in individual_dict.keys():
     a = individual_dict[key]
   else:
    a = 0
   ind_lst.append(a)
Community
  • 1
  • 1
rex
  • 47
  • 1
  • 7
  • Please [edit] your question and add code defining the `master_iist` and, more importantly, the `for` loop that creates the `ind_lst`. – martineau Feb 12 '17 at 15:43
  • @martineau: Hi martineau, i have updated the info that you had requested – rex Feb 12 '17 at 19:05
  • @martineau: I was looking in the answers section and missed ur above comment. And sorry that i didn't explain my issue clearly :( the ind_lst is created for each file with the file extension .fa in the current working directory and the contents of ind_lst has to appended columnwise to the output CSV file. so all the .fa files are read, ind_lst are created and appended columnwise. In the example i have provided there 2 .fa files used to create ind_lst through for loop which has to be appended columnwise. Please let me know if you have any questions and if my question is still not clear – rex Feb 12 '17 at 19:23
  • rex: Good move. However I got tired of waiting for you to do it and answered your question as best I could from what was in it at that time. You could probably slightly simplify what's in my answer—the part that rearranges the `ind_lst`—by creating the `ind_lst` so that it was a list of lists, with the nth element in each sublist in it coming from each of the fasta files. If you can't figure-out how to do that, time to ask another question. – martineau Feb 12 '17 at 19:26
  • 1
    rex: Thanks. [My answer](http://stackoverflow.com/questions/21680473/how-can-i-open-multiple-files-files-unknown-beforehand-using-with-open-stat) to an unrelated question might help you create the `ind_lst` from multiple fasta files all open and being read at the same time (hint-hint). – martineau Feb 12 '17 at 19:35
  • @martineau When i use smaller files, i can create the read counts in a reasonable time. But when the size of the master_lst increases, then the execution time slows down drastically. Could you please suggest a solution to overcome this issue. I don't know hot to use multiprocessing module. I have posted a question regarding this here : http://stackoverflow.com/questions/42602306/how-to-use-multiprocessing-module-to-iterate-a-list-and-match-it-with-a-key-in-d Could you please suggest a solution? – rex Mar 05 '17 at 12:06
  • Yes. See the [latest revision](http://stackoverflow.com/a/42613205/355230) to my answer to your [follow-on question](http://stackoverflow.com/questions/42602306/how-to-use-multiprocessing-module-to-iterate-a-list-and-match-it-with-a-key-in-d). – martineau Mar 10 '17 at 21:55

2 Answers2

1

You need to combine the data from read, sample1 and sample2 into a row.

Assuming those are iterables, you could use zip to combine them:

for row in zip(read, sample1, sample2):
    writer.writerow(row)
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • sample 1 and sample 2 are not iterable i guess. Sample 1 and sample 2 are lists created through for loop. the list ind_lst should become empty after each for loop in order to accommodate the new list to be created and appended. Is there any other solution to this issue? – rex Feb 12 '17 at 14:38
  • @rex No. You will *have* to make an iterable for a row if you want multiple columns. – Roland Smith Feb 12 '17 at 15:45
1

You're actually trying to append several columns to the existing file, even if the data for these new columns is all stored in a single list. It might be better to arrange the data in the ind_lst differently. but since you haven't showed how that's done, the code below works with the format in your question.

Since modifying CSV files is tricky—since they're really just text file—it would be much easier to simply create a new file with the merged data, and then rename that file to match the original after deleting the original (you've now been warned).

import csv
from itertools import izip  # Python 2
import os
import tempfile

master_lst = [
    'read',
    'ACACCUGGGCUCUCCGGGUACC',
    'ACGGCUACCUUCACUGCCACCC',
    'AGGCAGUGUGGUUAGCUGGUUG'
]

ind_lst = [
    'sample1',
    '3',
    '3',
    '1',
    'sample2',
    '4',
    '4',
    '1'
]

csv_filename = 'output.csv'

def grouper(n, iterable):
    's -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ...'
    return izip(*[iter(iterable)]*n)

# first create file to update
with open(csv_filename, 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(((row,) for row in master_lst))

# Rearrange ind_lst so it's a list of pairs of values.
# The number of resulting pairs should be equal to length of the master_lst.
# Result for example data:  [('sample1', 'sample2'), ('3', '4'), ('3', '4'), ('1', '1')]
new_cols = (zip(*grouper(len(master_lst), ind_lst)))
assert len(new_cols) == len(master_lst)

with open(csv_filename, 'rb') as fin, tempfile.NamedTemporaryFile('r+b') as temp_file:
    reader = csv.reader(fin)
    writer = csv.writer(temp_file)
    nc = iter(new_cols)
    for row in reader:
        row.extend(next(nc))  # add new columns to each row
        writer.writerow(row)
    else:  # for loop completed, replace original file with temp file
        fin.close()
        os.remove(csv_filename)
        temp_file.flush()  # flush the internal file buffer
        os.fsync(temp_file.fileno())  # force writing of all data in temp file to disk
        os.rename(temp_file.name, csv_filename)

print('done')

Contents of file after creation followed by update:

read,sample1,sample2
ACACCUGGGCUCUCCGGGUACC,3,4
ACGGCUACCUUCACUGCCACCC,3,4
AGGCAGUGUGGUUAGCUGGUUG,1,1
martineau
  • 119,623
  • 25
  • 170
  • 301