0

This is a follow up question to this: Python: Find keywords in a text file from another text file

I want to put the textual data from line.strip to a CSV (or excel) file in 2 columns.

Here is my attempt:

import numpy as np
import pandas as pd
import csv

with open('C:\invoice.txt') as f:
    invoice_data = [line.strip() for line in f if line.strip()]

with open('C:\dict.txt') as f:
    dict_data = set([line.strip() for line in f if line.strip()])

for i in range(0, len(invoice_data), 2):
    if invoice_data[i] in dict_data:
        print(invoice_data[i: i + 2])

with open('C:\\Users\\fam_robo1\\Documents\\sample.csv','w') as csvfile:
    fieldnames = ['keyword','data']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for i in range(0, len(invoice_data), 2):
        writer.writerow ({'keyword':[invoice_data[i]] , 'data':[invoice_data[i+2]] })
    csvfile.close()

Any help would be appreciated .

Full Traceback:

Traceback (most recent call last):   
 File "C:\Users\fam_robo1\Documents\keyword.py", line 20, in <module> writer.writerow ({'keyword':[invoice_data[i]] , 'data':[invoice_data[i+2]] }) 
 IndexError: list index out of range
holdenweb
  • 33,305
  • 7
  • 57
  • 77
jokol
  • 383
  • 2
  • 12
  • 2
    Instead of using with open as, create a pandas DataFrame and use df.to_csv – Coding thermodynamist Jul 27 '17 at 12:50
  • I did that before i used this , i still got the same error. I used the method here : https://stackoverflow.com/a/13437855/6604134 – jokol Jul 27 '17 at 13:16
  • I do not understand why people keep downvoting my questions. Thanks to them , I may get blocked :/ – jokol Jul 27 '17 at 13:19
  • Could you put information about the error in your question, such as Full Traceback, as otherwise it can be difficult for us to help you. – Professor_Joykill Jul 27 '17 at 13:26
  • Traceback (most recent call last): File "C:\Users\fam_robo1\Documents\keyword.py", line 20, in writer.writerow ({'keyword':[invoice_data[i]] , 'data':[invoice_data[i+2]] }) IndexError: list index out of range – jokol Jul 27 '17 at 13:27
  • the `i+2` seems to cause the problem (you are accesing non existing elements at the end of your invoice_data) – Quickbeam2k1 Jul 27 '17 at 13:32

2 Answers2

1

The clue is in the "list index out of range" message. When writing rows you reference both invoice_data[i] and invoice_data[i+2] (though I am not sure why you put them in lists, since trying to write out a list as a CSV element will probably cause trouble too).

Your for statement can take i right up to len(invoice_data)-1, and clearly at that value the index i+2 is outside the permissible index range - so you get the traceback.

You ask in a comment why people keep downvoting your questions. I suspect this is because they show little real effort to understand what the error actually is. In the previous question to which you refer you say "I keep on getting the Index Error. Do I need to store it in a table first?" but you don't explain why you think this would help, or indeed even what it is supposed to mean.

I suspect you may be trying to run before you can walk, and while attempting difficult problems shows some spirit, you would do well to look hard at the output you get before trying to recruit the assistance of SO - the messages Python produces mostly have meaning, and if you don't understand them then perhaps you should start by trying to determine what they mean. "What does this error message mean" is usually an acceptable question if you genuinely can't understand it.

As you learn you will doubtless become better able to determine what is going wrong with your programs, but relying on other people will not grow your understanding as quickly as your own efforts to comprehend.

holdenweb
  • 33,305
  • 7
  • 57
  • 77
  • 1
    The world needs more people like you. Thank you. What you said is true , I am a college student , currently at an internship , who is working alone on a project. I have worked with C#,C++ and JAVA,but python is completely new to me. I thought hacking through it is the best way to learn , but you are right , I am not asking myself the right questions. The reason I thought for creating table was bec I thought it would be easy to manipulate to a csv (my end goal) What I am doing is creating a prog to get data from pdf's and store it in excel files. – jokol Jul 28 '17 at 06:50
  • As for people downvoting this , i dont understand why they can't just chose to ignore. I dont even have enough reputation to upvote your answer – jokol Jul 28 '17 at 06:53
  • Don't worry about that. The downvoters are _probably_ concerned with trying to keep question quality high. – holdenweb Jul 28 '17 at 11:08
0

Special thanks to holdenweb for helping me believe in myself.

So I solved this issue with very simple re-ordering , but only after learning the basic python concepts first.

So if any beginner like myself is reading this , follow the wisdom shared by holdenweb and first go through basic concepts even when you think you can just wing it.

writer = pd.ExcelWriter('pandas_simple.xlsx')

with open('C:\\Users\\fam_robo1\\Documents\\sample.csv','w') as csvfile:
    fieldnames = ['keyword','data']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    for i in range(0, len(invoice_data), 2):
        if invoice_data[i] in dict_data:
            list1 = [invoice_data[i]];
            list2 = [invoice_data[i+1]];
            print(invoice_data[i: i + 2])
            writer.writerow ({'keyword':[list1] , 'data':[list2] })

    csvfile.close()

Another way using pandas :

for i in range(0, len(invoice_data), 2):
    if invoice_data[i] in dict_data:
        list1 = [invoice_data[i]];
        list2 = [invoice_data[i+1]];
        print(invoice_data[i: i + 2])
        df = pd.DataFrame({ 'keyword':list1 , 'information':list2})
        # Convert the dataframe to an XlsxWriter Excel object.
        df.to_excel(writer, sheet_name='Sheet1',startrow=count ,header=False, index=False )
        count=count+1
        # Close the Pandas Excel writer and output the Excel file.

        writer.save()
jokol
  • 383
  • 2
  • 12
  • I'm having trouble understanding how this works. `list1` and `list1` are single-element lists, which would appear to mean that you are trying to write each value as a list with one element that is a list with one element! What do you actually see in your CSV output? – holdenweb Jul 28 '17 at 11:11
  • haha yes , actually I put it in a loop and write one element to each row (notice the writer.writerow) , so it ends up looking the way I wanted it to .though I actually came up with an even better solution using pandas. – jokol Jul 28 '17 at 11:32