1

I am writing an automated workflow to analyse data from some simulations. I have 580 directories with the results contained in an .xvg file which I then read, perform the analysis, and write the results to a file called 'results.txt'. Each script is ran using a job array system on a computer facility which enters each subdirectory with results and runs the scripts in parallel. The scripts in each directory are identical and the results being read are also in the same format. The majority of the time, the scripts run okay and give results as expected. However, sometimes I get a syntax error which and the code will stop. The strange part is that this is an inconsistent error across multiple runs of the analysis code. For example, sometimes I run the job array and find the error on 3/580 of the results, then I run again and this time the error is on 5/580. It seems random when I get the error and when I don't. I also haven't been able to reproduce the error when I manually go into one of the failed folders and run the code directly in there instead of telling the job array to run them all at once.

Here is the relevant parts of code:

import os 
import re
import numpy as np
import pandas as pd
import fcntl 
from decimal import Decimal, ROUND_UP
import io


# going to read directory name 

def get_dir():
    ''' Read the current working directory and extract the molecule properties'''
    
    cwd = os.getcwd()
    
    dir_name = os.path.basename(cwd)
    
    # Extract the properties
    
    backbone_length = int(re.search(r'(\d+)B', dir_name).group(1))
    side_chain_length = int(re.search(r'(\d+)S', dir_name).group(1))
    side_chain_frequency = int(re.search(r'-(\d+)-', dir_name).group(1))
    rigidity_flexibility = str(re.search(r'-(\w)$', dir_name).group(1))
    
    try:
        grafting = (Decimal(backbone_length/side_chain_frequency).to_integral_value(rounding=ROUND_UP)/backbone_length)
        grafting = float(grafting)
    except ZeroDivisionError:
    
        grafting = 0
    
    return backbone_length, side_chain_length, side_chain_frequency, grafting, rigidity_flexibility


def rsquare(x,y,z): 
    ''' Calculates the square radius of gyration '''
    
    # calculates property 1
    
    return R 


def acyli(x,y,z):
    '''Calculates c'''

    # calculates property 2
    
    return A

def error_acylin(a,b,c,x,y,z):
    
    # calculates error
    
    return df

def error_rsquare(a,b,c,x,y,z):
    """ Calculate the error on the square radius of gyration"""
    
    # calculates error
    
    return df


# reading in the .xvg file 


def filter_comments(filepath):
    with open(filepath, 'r') as f:
        lines = [line.strip() for line in f if not line.startswith(('#', '@'))]
    return '\n'.join(lines)

file_path = "molecule.xvg"
data_without_comments = filter_comments(file_path)
df = pd.read_csv(io.StringIO(data_without_comments), delim_whitespace=True, header=None)

eig1 = df[3].tolist()
eig2 = df[4].tolist()
eig3 = df[5].tolist()

# calculating shape properties 

r_g2 = rsquare(np.average(eig1), np.average(eig2), np.average(eig3))

r_g2_err = error_rsquare(np.std(eig1), np.std(eig2), np.std(eig3), np.average(eig1), 
                         np.average(eig2), np.average(eig3))

acylin = acyli(np.average(eig1), np.average(eig2), np.average(eig3))

acylin_err=error_acylin(np.std(eig1), np.std(eig2), np.std(eig3), np.average(eig1), np.average(eig2), np.average(eig3))

file_path_2 = r"results.txt"


with open(file_path_2, 'w') as file:
    file.write(f"{get_dir()[0]:3d}             {get_dir()[1]:3d}                {get_dir()[2]:3d}                   {np.round(get_dir()[3],2):3f}                       {get_dir()[4]:3s}       {r_g2:3f} +/- {r_g2_err:3f}     {acylin:3f} +/- {acylin_err:3f} \n")

and then the syntax error I get is:

  File "analysis.py", line 108
    file.write(f"{get_dir()[0]:3d}             {get_dir()[1]:3d}                {get_dir()[2]:3d}                   {np.round(get_dir()[3],2):3f}                       {get_dir()[4]:3s}       {r_g2:3f} +/- {r_g2_err:3f}     {acylin:3f} +/- {acylin_err:3f} \n")
                                                                                                                                                                                                                                                                       ^
SyntaxError: invalid syntax

I think it must be something to do with the job array but if the job fails I find it strange that it's flagged as a syntax error and not some CPU or other hardware issue that it would usually be if the job doesn't submit properly. Has anyone encountered something similar before?

Note that I added the line assert sys.version_info >= (3, 6) to the script to check the python version is new enough but no errors were given and the original syntax error is still persisting. Therefore I don't think the error is coming from use of f strings with an old version of python.

user6277
  • 11
  • 2
  • 1
    Can't reproduce - the given line does not raise a syntax error when pasted into a Python REPL. Re: _"I think it must be something to do with the job array"_, a syntax error means that the module failed to compile to bytecode. Python never got so far as to run any of your code – Brian61354270 Jul 26 '23 at 16:15
  • 1
    You need to upgrade to at least Python 3.6 to use f-strings. – Barmar Jul 26 '23 at 16:17
  • As a first step, I might try calling `get_dir()` once an unpacking the results into explicit variables to use in the `write()` but that is not really a solution. Just a suggestion of what I might try as a next step. – JonSG Jul 26 '23 at 16:17
  • 1
    @Barmar I'm using 3.9 – user6277 Jul 26 '23 at 16:19
  • Re: what Barmar pointed out, can you check if all machines / environments that these jobs are being run on are using the same Python interpreter version, if there's any chance that they can differ? – Brian61354270 Jul 26 '23 at 16:20
  • 2
    As I understand the question, the scripts are executed on different nodes in some compute infrastructure, right? Then it does not matter which version of Python you have. Maybe one out of the several nodes still has an old version. – tobias_k Jul 26 '23 at 16:20
  • @Brian61354270 In the facility I use you load the specific version of python that you want within the job script and then the job should only be allocated to relevant nodes. The documentation also only has 3.6 to 3.9 available so no older version should be used. But I do suspect there is some allocation issue now which I need to find a way to identify – user6277 Jul 26 '23 at 16:28
  • 1
    Maybe you could try adding `assert sys.version_info >= (3, 6)` to the part of the code that runs on the job array. If that gives an error, it's very direct evidence that you have a node running an older python version. – slothrop Jul 26 '23 at 16:30
  • @slothrop Thank you for your suggestion. I tried adding the line you suggested and unfortunately I am still getting the same error in some of the folders so I think the version is definitely not the issue – user6277 Jul 26 '23 at 17:09

0 Answers0