0

*very new to Python I need to merge a large number of txt files from a single directory into one csv file. The text from the files needs to be converted into separate rows and columns (five columns in each file, N number of rows). I used the code from this question:

import os
import csv

dirpath = 'path_of_directory'
output = 'output_file.csv'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(['FileName', 'Content'])

    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            csvout.writerow([filename, afile.read()])
            afile.close()

    outfile.close()

It works for me but it puts all the content from the file into one table cell. I read through a lot of Q&As and couldn't figure out how to modify the code in order to separate the content into different columns and rows.

Luba
  • 21
  • 1
  • 2
    Can you please update the question to include some sample-inputs from the text-files, and expected output? – Hampus Larsson Jun 02 '20 at 14:40
  • Why do you think that all the content is in one single table cell? Could you have used Excel? Do you know that Excel is known its the poor processing of CSV files that it has not written itself on the same system? What contains the resulting csv file when edited as text and what would you want it to contain? – Serge Ballesta Jun 02 '20 at 14:45
  • What is the format of the input files? What is the desired output in the output file? `csv.reader()` can likely help. FYI, with `.writerow([filename,afile.read()])` you've specifically said put filename in the first column and the entire file content in the second column. – Mark Tolonen Jun 02 '20 at 16:25
  • the original files have .tsv extension. Unfortunately, I can't put examples from the files or the resulting .csv here. When I open it in Excel, it says that the cell with all the file content is a merged cell and I can't split it. When I open it in Notepad++, it's all separate but for work reasons, it needs to be separate in Excel as well – Luba Jun 02 '20 at 17:52

1 Answers1

1

You have to read the five words in a line with loop for all the lines. Next, write the five words along the filename as shown below. Hope that helps. I have added pseudo code for the concept. Hope that helps!!!

files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        #for line in afile.read()
        #     words_in_a_line = split_line_to_get_words(line) 
        #     csvout.writerow([filename, words_in_a_file])

        csvout.writerow([filename, afile.read()]) # delete this line
        afile.close()

outfile.close()

Here the function split_line_to_get_words will return a list of five words from the line. If the lines are separated according to space or comma, use them to get five words from a line.

avijit
  • 805
  • 2
  • 12
  • 21
  • Sorry, I don't really understand what you mean by split_line_to_get_words() – Luba Jun 02 '20 at 17:56
  • By split_line_to_get_words(), I mean separate the words in a line by space or comma as they are in your data. – avijit Jun 02 '20 at 22:26