0

I have large data files (CSV type) that I read with pandas. Each files has a information column that has many names and numbers seperated by ;. Below how this column looks like:

0    Acid: 74.1 [°C];LeakRate [Bar/Min]:  103 ;P: ...
1     Acid: 73.9 [°C]; LeakRate [µBar/Min]:  371 ; ...
2     Acid: 73.9 [°C]; LeakRate [µBar/Min]:  107 ; ...
3     Acid: 73.9 [°C]; LeakRate [µBar/Min]:  371 ; ...
4     Acid: 74.0 [°C]; LeakRate [µBar/Min]:  107 ; ...
Name: Information, dtype: object

I use string split to separate using following code line and then get for example LeakRate [µBar/Min] and corresponding measurement that is 103 in zero index above.

    df["Information"]str.split(";", expand=True)[1].str.split(":", expand=True)[1]

Unfortunately the data files that are produced are not always same, so positions are not always same. Therefore, I would like to locate specific string with special chars such as LeakRate [µBar/Min] and then get the corresponding numbers so as to be able to plot them for further analysis.

Has anyone know a easy way doing it? I am new in python, so I appreciate any help.

Thanks,

Eala

Bill Bell
  • 21,021
  • 5
  • 43
  • 58
eala
  • 1
  • The units for LeakRate are different in the first two records. Should they be the same? And should the units be the same for all variables, for all records? – Bill Bell Jan 09 '20 at 20:54
  • Perhaps you could post the first few records of the csv on pastebin? – Bill Bell Jan 09 '20 at 20:57
  • You can probably use the approach offered in my answer at https://stackoverflow.com/a/49014385/131187. Of course you would need to adjust how you parse the csv records. – Bill Bell Jan 09 '20 at 21:32
  • All units are constant. It is a typo error. – eala Jan 10 '20 at 11:59

1 Answers1

0

This sounds like you want to figure out the column index beforehand.

This could be done as:

firstRow = ...
leakRateCols = [i for i, val in enumerate(firstRow["Information"].str.split(";")) if 'LeakRate' in val]
if len(leakRateCols) > 1:
   # Raise some error here, because there are multiple columns with LeakRate.
leakRateCol = leakRateCols[0]

for df in ...:
   ... = df["Information"].str.split(";")[leakRateCol].str.split(":")[1]

You may want to look into using the csv library though. Might be useful to you. https://docs.python.org/3/library/csv.html

Trevor Siemens
  • 629
  • 5
  • 10