I am writing an automated workflow to analyse data from some simulations. I have 580 directories with the results contained in an .xvg file which I then read, perform the analysis, and write the results to a file called 'results.txt'. Each script is ran using a job array system on a computer facility which enters each subdirectory with results and runs the scripts in parallel. The scripts in each directory are identical and the results being read are also in the same format. The majority of the time, the scripts run okay and give results as expected. However, sometimes I get a syntax error which and the code will stop. The strange part is that this is an inconsistent error across multiple runs of the analysis code. For example, sometimes I run the job array and find the error on 3/580 of the results, then I run again and this time the error is on 5/580. It seems random when I get the error and when I don't. I also haven't been able to reproduce the error when I manually go into one of the failed folders and run the code directly in there instead of telling the job array to run them all at once.
Here is the relevant parts of code:
import os
import re
import numpy as np
import pandas as pd
import fcntl
from decimal import Decimal, ROUND_UP
import io
# going to read directory name
def get_dir():
''' Read the current working directory and extract the molecule properties'''
cwd = os.getcwd()
dir_name = os.path.basename(cwd)
# Extract the properties
backbone_length = int(re.search(r'(\d+)B', dir_name).group(1))
side_chain_length = int(re.search(r'(\d+)S', dir_name).group(1))
side_chain_frequency = int(re.search(r'-(\d+)-', dir_name).group(1))
rigidity_flexibility = str(re.search(r'-(\w)$', dir_name).group(1))
try:
grafting = (Decimal(backbone_length/side_chain_frequency).to_integral_value(rounding=ROUND_UP)/backbone_length)
grafting = float(grafting)
except ZeroDivisionError:
grafting = 0
return backbone_length, side_chain_length, side_chain_frequency, grafting, rigidity_flexibility
def rsquare(x,y,z):
''' Calculates the square radius of gyration '''
# calculates property 1
return R
def acyli(x,y,z):
'''Calculates c'''
# calculates property 2
return A
def error_acylin(a,b,c,x,y,z):
# calculates error
return df
def error_rsquare(a,b,c,x,y,z):
""" Calculate the error on the square radius of gyration"""
# calculates error
return df
# reading in the .xvg file
def filter_comments(filepath):
with open(filepath, 'r') as f:
lines = [line.strip() for line in f if not line.startswith(('#', '@'))]
return '\n'.join(lines)
file_path = "molecule.xvg"
data_without_comments = filter_comments(file_path)
df = pd.read_csv(io.StringIO(data_without_comments), delim_whitespace=True, header=None)
eig1 = df[3].tolist()
eig2 = df[4].tolist()
eig3 = df[5].tolist()
# calculating shape properties
r_g2 = rsquare(np.average(eig1), np.average(eig2), np.average(eig3))
r_g2_err = error_rsquare(np.std(eig1), np.std(eig2), np.std(eig3), np.average(eig1),
np.average(eig2), np.average(eig3))
acylin = acyli(np.average(eig1), np.average(eig2), np.average(eig3))
acylin_err=error_acylin(np.std(eig1), np.std(eig2), np.std(eig3), np.average(eig1), np.average(eig2), np.average(eig3))
file_path_2 = r"results.txt"
with open(file_path_2, 'w') as file:
file.write(f"{get_dir()[0]:3d} {get_dir()[1]:3d} {get_dir()[2]:3d} {np.round(get_dir()[3],2):3f} {get_dir()[4]:3s} {r_g2:3f} +/- {r_g2_err:3f} {acylin:3f} +/- {acylin_err:3f} \n")
and then the syntax error I get is:
File "analysis.py", line 108
file.write(f"{get_dir()[0]:3d} {get_dir()[1]:3d} {get_dir()[2]:3d} {np.round(get_dir()[3],2):3f} {get_dir()[4]:3s} {r_g2:3f} +/- {r_g2_err:3f} {acylin:3f} +/- {acylin_err:3f} \n")
^
SyntaxError: invalid syntax
I think it must be something to do with the job array but if the job fails I find it strange that it's flagged as a syntax error and not some CPU or other hardware issue that it would usually be if the job doesn't submit properly. Has anyone encountered something similar before?
Note that I added the line assert sys.version_info >= (3, 6)
to the script to check the python version is new enough but no errors were given and the original syntax error is still persisting. Therefore I don't think the error is coming from use of f strings with an old version of python.