What I want to do
I am trying to parse the geometry information of a nastran file using python. My current attempts use NumPy as well as regular expressions. It is important to read the data fast and that the result is a NumPy array.
Nastran file format
A nastran file can look like the following:
GRID 1 3268.616-30.0828749.8656
GRID 2 3268.781 -3.-14749.8888
GRID 3 3422.488580.928382.49383
GRID 4 3422.488 10.-2.49383
...
I am only interested in the right part of the file. There the information is present in chunks of 8 characters for the x, y and z coordinates respectively. A common representation of the coordinates above would be
3268.616, -30.0828, 749.8656
3268.781, -3.e-14, 749.8888
3422.488, 580.9283, 82.49383
3422.488, 10., -2.49383
What I tried so far
Up until now, I tried to use regular expressions and NumPy to avoid all kinds of python for loops to be as fast a possible about dealing with the data. After reading the complete file into memory and store it in the fContent
variable I tried:
vertices = np.array(re.findall("^.{24}(.{8})(.{8})(.{8})", fContent, re.MULTILINE), dtype=float)
However, this falls short for the -3.-14
expressions. A solution would be to loop over the resulting string tuples of the regex and substitude all .-
with .e-
and then create the NumPy array from the list of string tuple. (Not shown in the code above). However, I think that this approach would be slow since it involves a loop over all found tuples of the regular expression and perform a substitution.
What I am looking for
I am looking for any fast way to read in the data. My current hopes are on a smart regular expression that successfully deals with the "-3.-14
" problem. The regex would need to substitute all .-
characters with .e-
but only if the .
is not at the end of an 8 character block. Up until now, I was not able to create such a regular expression. But as I said, any other fast way of reading in the data is also very welcome.