0
directory = raw_input("INPUT Folder:")
output = raw_input("OUTPUT Folder:")
txt_files = os.path.join(directory, '*.txt')
for txt_file in glob.glob(txt_files):
    filename = os.path.splitext(os.path.basename(txt_file))[0] + '.csv'
    with open(txt_file, "rb") as input_file, open("book.csv", 'a') as output_file:
        out_csv = csv.writer(output_file)
        lines = input_file.readlines()
        for i in range(0, len(lines)):
            if i==len(lines):
                out_csv.writerow(lines)
            else:
                lines.append(lines[i+1])

i am trying to open the text files in movie review data base and convert it such that the total data in a text file should come as one row in csv i.e. the neg folder in movie review consist of 1000 file thn my csv should contain 1000 rows of all text each row corresponding to complete text of one file please help me i have tried various ways but it gives some error or the other ,with this code it is givivng error as

Traceback (most recent call last): File "C:\Python27\preprocessing adding adnan.py", line 51, in lines.append(lines[i+1]) IndexError: list index out of range

user1805250
  • 3,549
  • 2
  • 13
  • 4

3 Answers3

0

This loop needs remodeling:

for i in range(0, len(lines)):
        if i==len(lines):
            out_csv.writerow(lines)
        else:
            lines.append(lines[i+1])

It also makes no sense to append stuffs from lines back into lines, you need another variable. Rewrite that loop like this to avoid :

newlist = list()
for i in range(0, len(lines)-1):
    newlist.append(lines[i+1])
out_csv.writerow(newlist)
Raiyan
  • 1,589
  • 1
  • 14
  • 28
0

Last lines of your code are totally problematic:

1- In the for loop for i in range(0, n):, the i is never gonna be equal to n, it goes from 0 to n-1

2- range(0, n) is the same as range(n), so use the one that's shorter

3- Don't change a list in the same loop that you are iterating over the list

4- You are appending to lines it's same elements. I think you are duplicating the rows (other than the header), like:

lines = lines + lines[1:]

And I'm not sure why are doing this!

5- Lines that are returned by fp.readlines() has trailing newlines that probably corrupts your csv file.

6- You can simply iterate over a file object, like for line in open(...):

So the code you should have written I think is:

directory = raw_input("INPUT Folder:")
output = raw_input("OUTPUT Folder:")
txt_files = os.path.join(directory, '*.txt')
for txt_file in glob.glob(txt_files):
    filename = os.path.splitext(os.path.basename(txt_file))[0] + '.csv'
    with open(txt_file, "rb") as input_file, open("book.csv", 'a') as output_file:
        out_csv = csv.writer(output_file)
        row = []
        for line in input_file:
            line = line.strip()## removes trailing newline, and possible leading whitespaces
            if line:
                row.append(line)
        out_csv.writerow(row)
saeedgnu
  • 4,110
  • 2
  • 31
  • 48
  • firstly thank u so much i am new to python,the code is running but again the same problem the total text of one file is occupying multiple rows i want that the total text of one file occupy only one row i would be very thank full if u can help me – user1805250 Nov 08 '13 at 06:26
0

I have modified the code snippet you posted above. Please try this and let me know if it works:

directory = raw_input("INPUT Folder:")
output = raw_input("OUTPUT Folder:")
txt_files = os.path.join(directory, '*.txt')
for txt_file in glob.glob(txt_files):
    filename = os.path.splitext(os.path.basename(txt_file))[0] + '.csv'
    with open(txt_file, "rb") as input_file, open("book.csv", 'a') as output_file:
        out_csv = csv.writer(output_file)
        lines = input_file.readlines()
        complete_file_content = [line.strip() for line in lines]
        out_csv.writerow(complete_file_content)

The reason you were facing the index out of range error is because you are trying to access the (i+1)th element in the list - hence for e.g, if the list contains 10 elements indexed from 0 to 9, then when i is 9, you try to access i+1 that is 10, which is not present within the list.

I got the concise way of joining multiple lines into a single line within this SO question.

Hope this helps.

Community
  • 1
  • 1
Prahalad Deshpande
  • 4,709
  • 1
  • 20
  • 22