0
import os
import re
files = ['/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i1/log', '/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i2/log', '/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/log', '/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue/log', '/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue_2/log', '/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i4/log', '/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i5/log']
path = "i*"
cwd = "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon"
log = "log"
cont = "continue"

repath = path.replace("*", "[0-9]+")
for f1 in files:
    for f2 in reversed(files):
        match1 = re.search(f"({cwd}{os.sep}{repath}{os.sep}{cont}_*[0-9]*{os.sep}{log})", f1)
        match2 = re.search(f"({cwd}{os.sep}{repath}{os.sep}{log})", f2)
        match3 = re.search(f"({cwd}{os.sep}({repath}){os.sep}{cont}_*([0-9]*){os.sep}{log})", f1)
        match4 = re.search(f"({cwd}{os.sep}({repath}){os.sep}{cont}_*([0-9]*){os.sep}{log})", f2)

        print(f1, f2,
              "-" if match3 is None else match3[2],
              "-" if match4 is None else match4[2],
              "-" if match3 is None else ("1" if match3[3] == "" else match3[3]),
              "-" if match4 is None else ("1" if match4[3] == "" else match4[3]),
              )

        if (match1 is not None and match2 is not None and
            (re.search(f"{os.sep}i[0-9]+", f1)[0] == re.search(f"{os.sep}i[0-9]+", f2)[0]
             and f1 == match1.groups(0)[0] and f2 == match2.groups(0)[0])) or \
                (match3 is not None and match4 is not None and
                 (match3[2] == match4[2] and
                  int("1" if match3[3] == "" else match3[3]) < int("1" if match4[3] == "" else match4[3]))):
            files.remove(f2)

I am trying to remove /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/log and /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue/log. I can remove the first one but I can not remove the second. The part int("1" if match3[3] == "" else match3[3]) < int("1" if match4[3] == "" else match4[3]) seems not working as expected. Changing < to > removes /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue_2/log, which is not desired. Why is it not working? What is the correct code?

Eftal Gezer
  • 191
  • 1
  • 8
  • 1
    Confirm that you are actually comparing the numbers you think you should be comparing. – chepner Sep 14 '22 at 17:37
  • 2
    What are the values of `match3[3]` and `match4[3]` when this happens? – Barmar Sep 14 '22 at 17:38
  • 7
    I recommend breaking up that complex `if` condition into multiple statements. It's unreadable as currently written. – Barmar Sep 14 '22 at 17:39
  • @chepner I confirm it by the print output. The related lines in the output are: ``` /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue/log /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue_2/log i3 i3 1 2 /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue/log /home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue/log i3 i3 1 1 ``` – Eftal Gezer Sep 14 '22 at 17:40
  • @Barmar It is 1 and 2, respectively. – Eftal Gezer Sep 14 '22 at 17:41
  • @Barmar, I mean, these are "" and 2, respectively but I convert "" to 1. – Eftal Gezer Sep 14 '22 at 17:42
  • And are you expecting the condition to be true or false in this case? – Barmar Sep 14 '22 at 17:45
  • I can't figure out what logic you're trying to implement in this case, it's just too convoluted. – Barmar Sep 14 '22 at 17:46
  • @Barmar I expect it `True`. – Eftal Gezer Sep 14 '22 at 17:46
  • please post a minimal reproducible example: https://stackoverflow.com/help/minimal-reproducible-example – D.L Sep 14 '22 at 17:49
  • @Barmar It is actually remove_nones function of SIESTAstepper. [link](https://github.com/eftalgezer/SIESTAstepper/blob/main/SIESTAstepper/helpers.py) – Eftal Gezer Sep 14 '22 at 17:50
  • @EftalGezer I don't understand it there, either. – Barmar Sep 14 '22 at 17:51
  • What are "energy values" and how does that relate to the filename patterns? – Barmar Sep 14 '22 at 17:53
  • 2
    Removing from a list while you're iterating over it is a bad idea. https://stackoverflow.com/questions/6260089/strange-result-when-removing-item-from-a-list-while-iterating-over-it – Barmar Sep 14 '22 at 17:55
  • @Barmar The energy values are from the SIESTA log files. SIESTA is a density functional theory software. Sometimes the calculation might be a break-in, e. g. power blackout or the user can terminate to continue later. In this case, the user creates a folder, namely, continue, copies some files under it, and continues calculating. The break-in log files do not contain any energy values and should be eliminated. – Eftal Gezer Sep 14 '22 at 18:05
  • @Barmar The problem seems to "iteration over list" problem, you are right. When I copy the for loop at the bottom and run, I can get the desired result. Of course, running the same code twice is not an ideal solution. – Eftal Gezer Sep 14 '22 at 18:13

1 Answers1

0

I've picked a different algorithm, with one pattern match. it makes the /continue optional, as is the continue_# is also optional.

One you do that, I keep track of the current log file for each instance, if there is a better current log file, then I set the older file to delete. This code works when the log files are encountered in any order (current log first, or current log last). I use a simple scheme which is i#/log is zero, continue/log is 1, and continue_#/log is #. obviously the higher number determines which is the most current.

This has the advantage of scanning the list once, since I use a hash to keep track of what I've already found.


import os
import re

files = [
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i1/log",
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i2/log",
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/log",
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue/log",
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i3/continue_2/log",
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i4/log",
    "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon/i5/log",
]
path = "i*"
cwd = "/home/egezer/Desktop/SIESTAstepper_tutorial/Carbon"
log = "log"
cont = "continue"
repath = path.replace("*", "[0-9]+")

active_log = {}
to_remove = []
for filename in files:
    logmatch = re.search(
            f"({cwd}{os.sep}({repath})({os.sep}{cont}(_([0-9]+)){{0,1}}){{0,1}}{os.sep}{log})", filename
        
    )
    if not logmatch:
        continue
    _, instance, extended, _, increment = logmatch.groups()
    lognumber = 0
    if extended is not None:
        lognumber = 1 if increment is None else int(increment)
    if instance not in active_log:
        active_log[instance] = (lognumber, filename)
    else:
        if active_log[instance][0] > lognumber:
            to_remove.append(filename)
        else:
            to_remove.append(active_log[instance][1])
            active_log[instance] = (lognumber, filename)

for filename in to_remove:
    print(f"need to remove {filename}")
toppk
  • 696
  • 4
  • 10