Extract number frollowing a specific string with special chars in a large text file using python

Question

I have large data files (CSV type) that I read with pandas. Each files has a information column that has many names and numbers seperated by ;. Below how this column looks like:

0    Acid: 74.1 [°C];LeakRate [Bar/Min]:  103 ;P: ...
1     Acid: 73.9 [°C]; LeakRate [µBar/Min]:  371 ; ...
2     Acid: 73.9 [°C]; LeakRate [µBar/Min]:  107 ; ...
3     Acid: 73.9 [°C]; LeakRate [µBar/Min]:  371 ; ...
4     Acid: 74.0 [°C]; LeakRate [µBar/Min]:  107 ; ...
Name: Information, dtype: object

I use string split to separate using following code line and then get for example LeakRate [µBar/Min] and corresponding measurement that is 103 in zero index above.

    df["Information"]str.split(";", expand=True)[1].str.split(":", expand=True)[1]

Unfortunately the data files that are produced are not always same, so positions are not always same. Therefore, I would like to locate specific string with special chars such as LeakRate [µBar/Min] and then get the corresponding numbers so as to be able to plot them for further analysis.

Has anyone know a easy way doing it? I am new in python, so I appreciate any help.

Thanks,

Eala

The units for LeakRate are different in the first two records. Should they be the same? And should the units be the same for all variables, for all records? — Bill Bell, Jan 09 '20 at 20:54
Perhaps you could post the first few records of the csv on pastebin? — Bill Bell, Jan 09 '20 at 20:57
You can probably use the approach offered in my answer at https://stackoverflow.com/a/49014385/131187. Of course you would need to adjust how you parse the csv records. — Bill Bell, Jan 09 '20 at 21:32

score 0 · Answer 1 · answered Jan 09 '20 at 21:14

This sounds like you want to figure out the column index beforehand.

This could be done as:

firstRow = ...
leakRateCols = [i for i, val in enumerate(firstRow["Information"].str.split(";")) if 'LeakRate' in val]
if len(leakRateCols) > 1:
   # Raise some error here, because there are multiple columns with LeakRate.
leakRateCol = leakRateCols[0]

for df in ...:
   ... = df["Information"].str.split(";")[leakRateCol].str.split(":")[1]

You may want to look into using the csv library though. Might be useful to you. https://docs.python.org/3/library/csv.html

Extract number frollowing a specific string with special chars in a large text file using python

1 Answers1