Reading file with panda, then using for loop

Question

Im trying to read large text files with hundreds of thousands of lines in them, to make it go faster Im trying to use panda, this is the concept I want my code to be like, Im not really sure how to use for loops with panda files. Lmk if its logical to have a program do this in an attempt to make the runtime shorter. Thanks..

df1 = pd.read_csv('FILENAME1',sep=',',error_bad_lines=False)
df2 = pd.read_csv('FILENAME2',sep=',',error_bad_lines=False)
for index, row in df1.iterrows():
    for index2, row2 in df2.iterrows():
        if index[1]==row2[2] and index[0]==row2[1]:
            print "this info matches"

what do you want to achieve? Could you please also post sample data sets (5-7 rows) for each data set in the text form and desired output/result set? — MaxU - stand with Ukraine, Apr 17 '16 at 01:20
The point of using a dataframe is so you don't have to loop. Check out this link. to compare similar dataframes. http://stackoverflow.com/questions/20225110/comparing-two-dataframes-and-getting-the-differences — Michael, Apr 17 '16 at 01:23

Sharad · Answer 1 · 2016-04-17T01:34:48.213

0

In my opinion, if runtime is important and you have to only do the computation that you have showed in code, please don't use pandas. Pandas will spend extra cycles to set itself up, do data cleaning etc.

edited Apr 17 '16 at 01:34

answered Apr 17 '16 at 01:23

Sharad

445
6
20

i can't agree with that taking into account that OP is going to work "with hundreds of thousands of lines". Loops of loops will most probably be slower compared to pandas aproach... – MaxU - stand with Ukraine Apr 17 '16 at 01:28
I believe with OP you mean open(). If that is the case, then even pandas will have to open the file, read every line and additionally do its own operations to save in correct format. It may unnecessarily check for validity of each data and transforming them if needed. In the end if the programmer is going to use only 1 or 2 items from a row then why should we waste cycles to clean other things up – Sharad Apr 17 '16 at 01:34
OP means [Original Poster](http://www.acronymfinder.com/Original-Poster-(OP).html) – MaxU - stand with Ukraine Apr 17 '16 at 08:51

Reading file with panda, then using for loop

1 Answers1