Two files, matching dates to match data, works but very slow

Question

I'm reading from two files. One has the format:

11 Mar 2020 12:34:38
Satellite
  Time of Apogee (UTCG)      Time of Perigee (UTCG)
------------------------    ------------------------
 1 Mar 2020 00:33:46.221     1 Mar 2020 01:24:13.803
 1 Mar 2020 02:14:32.442     1 Mar 2020 03:05:00.106
 1 Mar 2020 03:55:18.659     1 Mar 2020 04:45:46.406
 1 Mar 2020 05:36:04.874     1 Mar 2020 06:26:32.703
 1 Mar 2020 07:16:51.085     1 Mar 2020 08:07:18.996

The other has the format:

11 Mar 2020 12:40:11
Satellite:  LLA Position


       Time (UTCG)          Lat (deg)    Lon (deg)
------------------------    ---------    ---------
 1 Mar 2020 00:00:00.000      -34.134      -97.662
 1 Mar 2020 00:01:00.000      -30.417      -97.086
 1 Mar 2020 00:02:00.000      -26.720      -96.577
 1 Mar 2020 00:03:00.000      -23.048      -96.120
 1 Mar 2020 00:04:00.000      -19.399      -95.707

I'm matching on time, and outputting the first file time, with the second file lat/long. The issue is it takes forever as these are large files. I see I'm performing the for loops over and over again from the beginning rather than starting from where I left off...

How can I speed this up?

import datetime

with open("apFile.txt", "r+") as apFile:
    with open("dates.txt", "r+") as file:
        lines = file.readlines()
        apLines = apFile.readlines()

        margin = datetime.timedelta(seconds=30)
        j = 0
        for apLine in apLines:
            j += 1
            if j <= 6:
                continue
            apLines = apFile.readlines()
            apLine_list = apLine.split()
            ap_day = apLine_list[0]
            ap_month = apLine_list[1]
            ap_year = apLine_list[2]
            ap_time = apLine_list[3]
            apogee_time_datetime = datetime.datetime.strptime(ap_day + " " + ap_month + " " + ap_year + " " + ap_time, "%d %b %Y %H:%M:%S.%f")
            pr_day = apLine_list[4]
            pr_month = apLine_list[5]
            pr_year = apLine_list[6]
            pr_time = apLine_list[7]
            perigee_time_datetime = datetime.datetime.strptime(pr_day + " " + pr_month + " " + pr_year + " " + pr_time, "%d %b %Y %H:%M:%S.%f")
            i = 0
            for line in lines:
                i += 1
                if i <= 6:
                    continue
                line_list = line.split()
                ll_day = line_list[0]
                ll_month = line_list[1]
                ll_year = line_list[2]
                ll_time = line_list[3]
                ll_lat = line_list[4]
                ll_long = line_list[5]
                ll_time_datetime = datetime.datetime.strptime(ll_day + " " + ll_month + " " + ll_year + " " + ll_time, "%d %b %Y %H:%M:%S.%f")
                if apogee_time_datetime - margin <= ll_time_datetime <= apogee_time_datetime + margin:
                    print("Apogee: " + str(apogee_time_datetime) + " Lat: " + ll_lat + " Long: " + ll_long)

Best way to optimize a script is to profile it first and see where its spending most of its time. See [How can you profile a Python script?](https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) — martineau, Mar 11 '20 at 23:19

score 0 · Answer 1 · answered Mar 12 '20 at 00:27

0

You can implement break statements if you don't want to perform to the end of loops.

If you have a sorted database , you can implement a search algorithm to reduce execute times. Searching Algorithms

answered Mar 12 '20 at 00:27

Mustafa Akyazıcı

1

Two files, matching dates to match data, works but very slow

1 Answers1