Optimizing Nested Double Loops Python With Pandas

Question

How do you optimize nested for loops? let's say I have two tables of data with different datetimes. Assume the datetimes are sorted in order. The first table has a datetime, the second table has a start.datetime and end.datetime amongst other columns.

I can do the first loops for the first table, and the second loop to see if the elem in the first loop is between the start.datetime and end.datetime; if it is, write a row in the program Then, move onto the next elem in the first loop. Since I am coding in python and it's iteration, I don't think dynamic programming can apply (Only know memorization), but is there any way to optimize? I guess my code is time = O(n^2) since I need to loop twice and space = O(n).

I also need to ensure duplicates are removed if the user uploads the exact same first table. The time to run for the code may be short, but since my resulting csv holds a lot of data (like 2-3 years worth of data), I am planning to optimize it as far as possible.

#open new file using writer. write first row headings. then do a comparision and
        #only select those when lotstarttime is later than starttime, lotstarttime earlier than lotendtime
            #i is start time index and j is the lot time index
    with open(outfile, 'a', newline ='') as h:
        writer = csv.writer(h)
        if os.stat(outfile).st_size == 0:
            writer.writerow(('Start TimeStamp', 'Alarm_text', 'Package', 'Recipe', 'Lot_starttime', 'UL/L', 'Network'))
        for i in range(0,len(Start_time)):
           for j in range(0,len(Lot_starttime)):
                if Start_time[i]  > Lot_starttime[j] and Start_time[i] < Lot_endtime[j]:
                    writer.writerow((Start_time[i], Alarm_text[i], Package[i], Recipe[j], Lot_starttime[j],Unload_Load[j],Network[j]))
    #removes duplicate row keeping only first occurences
    #sorts according to time; earliest first
    df = pd.read_csv(outfile)
    df.drop_duplicates(keep = 'first',inplace = True)
    df['Start TimeStamp'] = pd.to_datetime(df['Start TimeStamp'])
    df.sort_values(by='Start TimeStamp', inplace = True)
    df.to_csv(outfile, index=False)
    end = time.time()
    print(f'Duration of code for {first_excel}, {sheet_name} is {round((end - start),2)}seconds.')

merge_data(first_excel,'A','AAA_Performance_Lot(2022)_N1.2.csv','A_AAA.csv')

The output is like

`What is your excel file name?: 2362022

Duration of code for A.xlsx, AAA is 2.24seconds.

...

Completed`

Thank You.

related: https://stackoverflow.com/a/55557758/10197418 - so instead of iterating lists and using csv r/w, why not try a pandas-only solution, using the built-in "vectorized" functionality? — FObersteiner, Jun 23 '22 at 09:19
You should provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) — Always Right Never Left, Jun 23 '22 at 09:23

Optimizing Nested Double Loops Python With Pandas

0 Answers0