1

I have a set of text documents (basically they are emails saved as text files.) I have to read these and write in a CSV or Pandas data frame. Each row should take one email/text file.

I am new to Python. I don't have an idea of how to proceed with this problem. Please help.

Filename Content
email1  Content of email 1
email2  Content of email 2
email3  Content of email 3
…   …
…   …
…   …
email n Content of email 7

Edit

I was using the below code

dirpath = 'path'
output = 'output_file.csv'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(['FileName', 'Content'])

    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            csvout.writerow([filename, afile.read()])
            afile.close()

    outfile.close()
Doubt Dhanabalu
  • 457
  • 4
  • 8
  • 18

2 Answers2

0

You can start to work from here:

import csv #is the library

with open('example.csv', 'w') as csvfile: #to create a new csv

     fieldnames = ['text']
     writer = csv.DictWriter(csvfile, fieldnames=fieldnames) #is the name of column 
     while length > 0:
           writer.writerow({'email': email}) # write a row
     length-=1 

p.s.

this work with python 3.6, good work

Andrea Perelli
  • 156
  • 2
  • 3
  • 14
  • Thanks for the answer. I have a basic question. May i know where we will mention the directory name. Similar to the one i have mentioned in my code (I have editied my question now). Thank you. – Doubt Dhanabalu Aug 03 '17 at 18:29
  • For example the variable 'output' can be both 'example.csv' or 'Desktop/example.csv' – Andrea Perelli Aug 03 '17 at 19:41
  • I mean the directory containing the input text files. – Doubt Dhanabalu Aug 03 '17 at 23:50
  • You can give it inside the variable, however this is the references: https://docs.python.org/3.6/library/csv.html , I think that you can found all the information here. If you need more, you can ask, I will try to help you. – Andrea Perelli Aug 04 '17 at 07:20
0

The answer provided here worked: Combine a folder of text files into a CSV with each content in a cell

import os
os.chdir('file path')
from pathlib import Path
with open('big.csv', 'w') as out_file:
    csv_out = csv.writer(out_file)
    csv_out.writerow(['FileName', 'Content'])
    for fileName in Path('.').glob('*.txt'):
        csv_out.writerow([str(fileName),open(str(fileName.absolute())).read().strip()])
Doubt Dhanabalu
  • 457
  • 4
  • 8
  • 18