1

I am writing the output of my code to the .csv file. There are three directories each directory contains 50-files. I want to write the output of each directory files in different column. LIKE;

       group1  group2 group3
file1 1445      89    87
file2 1225     100    47
file3 650      120    67
file4 230      140    97

I have following code to do so,

from collections import Counter
import glob
import os

out= open( 'output.csv','a')
out.write (';''group-1')
out.write (';''group-2')
out.write (';''group-3')
out.write('\n')
i = 1

while i<=50:
 out.write( "file-%d" %i )
 out.write('\n')
 i+=1
i=1
path = 'group/group-*-files/*.txt'

files=sorted(glob.glob(path))
c=Counter()

for filename in files:

 for line in open(filename,'r'):

    c.update(line.split())

 for item in c.items():
  oi=("{}\t{}".format(*item))  
  out_array = oi.split()

  if out_array[0]=='00000000':

   out.write(out_array[1])
   out.write('\n')
  c.clear()

The problem I am getting and did not able to solve, the answer starts writing in the first column after file number 50

file48
file49
file50
1445
1225
.. 

I want to write first 50 answers under group1 column, next 50 in group2 and last 50 in group3

final output looks like,

group1  group2 group3
file1 145      89     87
file2 850      100    47
file3 650      120    67
file4 230      140    97
hassan
  • 133
  • 1
  • 6
  • 17
  • You should really be using [`with`](https://stackoverflow.com/q/9282967/3901060) when you open files. – FamousJameous Jun 29 '17 at 14:04
  • @FamousJameous2 can `with` solve the problem of writing into the next column – hassan Jun 29 '17 at 14:08
  • It is not possible to edit a file inplace, you can only write lines to the end. Therefore open the files, store what you want as one line, and write it to `out`. Further information: [here](https://stackoverflow.com/questions/5453267/is-it-possible-to-modify-lines-in-a-file-in-place) – P. Siehr Jun 29 '17 at 14:09
  • `with` makes sure, that the files are closed at the end of code execution - see [here](https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files) – P. Siehr Jun 29 '17 at 14:10
  • is this the file opening problem? is there no way I can write data after first column? – hassan Jun 29 '17 at 14:12
  • No, it is not a problem with opening files. That was just a remark from @FamousJameous // In general you can't "edit" a line inplace. (see comments above). – P. Siehr Jun 29 '17 at 14:15
  • I am getting the output but the issue is all outputs are in one column, how can I start writing in second column – hassan Jun 29 '17 at 14:18
  • what these guys are saying is that you cannot write column by column, only line by line. therefore, you must generate the entire row you want to be written line by line, write it to the file, and continue. – Jeff Gong Jun 29 '17 at 14:20
  • Thanks @Jeff, my confusion is over.. – hassan Jun 29 '17 at 14:21

3 Answers3

0

This is how I would rewrite your code. The changes I made are:

  • Use the with statement when opening files to make sure they get closed
  • Use the csv module to make writing the csv file easier
  • Write the whole line at once by building one line at a time before writing it to the file.

Since I don't really know what is in your files, this isn't thoroughly tested.

import csv
from collections import Counter
import glob
import os

with open( 'output.csv','a') as out:
    writer =csv.writer(out, delimiter='\t')
    writer.writerow(['']+['group{}'.format(i) for i in range(1, 4)])
    path = 'group/group-*-files/*.txt'

    files=sorted(glob.glob(path))
    c=Counter()
    for i, filename in enumerate(files):
        line = ['file-{}'.format(i)]
        with open(filename) as infile:
            for line in infile:
                c.update(line.split())
        for key, count in c.items():
            if key == '00000000':
                line.append(count)
        writer.writerow(line)
        c.clear()
FamousJameous
  • 1,565
  • 11
  • 25
0

You have at least one problem with the wrong indentation. You firstly generate all file names by this:

...
while i<=50:
 out.write( "file-%d" %i )
 out.write('\n')                # replace \n to column delimiter \t
 i+=1

And than you begin process the files. You delete this line i=1 and all other text must start at the same indentation as out.write

from collections import Counter
import glob
import os

out= open( 'output.csv','a')  # flag a - Do you want append to existing file ?
out.write('file;group-1;group2;group3') # You forget column 1 - filename
# out.write (';''group-1')
# out.write (';''group-2')
# out.write (';''group-3')
# out.write('\n')
i = 1
while i<=50:
 out.write( "file-%d" %i )
 # out.write('\n')
 out.write(';')    # Insert character for column delimiter
 i+=1
 # i=1  Delete, because will cause infinite loop
 # Following code must run inside while loop, indent to the same level
 # as previous lines
 path = 'group/group-*-files/*.txt'

 files=sorted(glob.glob(path))
 c=Counter()

 for filename in files:

  for line in open(filename,'r'):

     c.update(line.split())

  for item in c.items():
   oi=("{}\t{}".format(*item))  
   out_array = oi.split()

   if out_array[0]=='00000000':

    out.write(out_array[1])
    # out.write('\n') - You don want create new lines, but only new columns for every group
    out.write(';') 

   c.clear()
  out.write('\n') # New line - new record
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0
 for filename in files:
  for item in c.items():
   oi=("{}\t{}".format(*item))  
   out_array = oi.split()` for filename in files:

  for line in open(filename,'r'):

     c.update(line.split())

  for item in c.items():
   oi=("{}\t{}".format(*item))  
   out_array = oi.split()
toku-sa-n
  • 798
  • 1
  • 8
  • 27
  • Hi and thanks for the answer. It would be great if you could explain what your code does. It would help the community much more that way to learn from you! – Simas Joneliunas Feb 03 '22 at 04:45