0

I'm trying to write a script for parsing the output of a computational chemistry software. I do this by looking a two keyword, that I called k_cutoff and k_energy.

I have all the output files inside directories with numbers 500, 600, ..., 900, 1000. I noticed that my scripts sort well all numbers except the 1000. For some reason the 1000 appears in the first position instead as last.

import os 
import numpy as np 
import re

energy = np.array([]) cutoff = np.array([])

k_cutoff = 'CUT_OFF_ENERGY:' 
k_energy = 'Final energy, E'

for path, directory, files in os.walk(os.getcwd()):
    for file in files:
        if file.endswith('.castep'):
            list = os.path.join(path ,file)
            file = open(list)
            for line in file.readlines():
                if k_energy in line:
                    energy = np.append(energy, re.findall(r'[+-]?\d+(?:\.\d+)?', line))
        elif file.endswith('.param'):
            list = os.path.join(path ,file)
            file = open(list)
            for line in file.readlines():
                if k_cutoff in line:
                    cutoff = np.append(cutoff, re.findall(r'[+-]?\d+(?:\.\d+)?', line))
                     
results = np.stack((cutoff, energy))
cbornes
  • 31
  • 6
  • 2
    `os.walk()` doesn't do any sorting at all - it just gives you the files in the same order the operating system gave them. Perhaps the OS keeps the files in sorted order, but if so they'd most likely be in alphabetical order - since "1" is less than "5", "1000" comes before "500"; numerical value plays no part in this. – jasonharper Dec 03 '21 at 18:58
  • That makes sense, but is there a way to overcome this? – cbornes Dec 03 '21 at 19:05

0 Answers0