I have a CSV file (Mspec Data) which looks like this:
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452; 1,00; 620
0000000001;00:00:01;0000001452; 1,20; 4730
0000000001;00:00:01;0000001452; 1,40; 4610
... ;..:..:..;..........;.........;...........
I read it via:
df = pd.read_csv(Filename, header=30,delimiter=';',decimal= ',' )
the result looks like this:
Cycle Time ms mass amu SEM c/s
0 1 00:00:01 1452 1.0 620
1 1 00:00:01 1452 1.2 4730
2 1 00:00:01 1452 1.4 4610
... ... ... ... ... ...
3872 4 00:06:30 390971 1.0 32290
3873 4 00:06:30 390971 1.2 31510
This data contains several Mass spec scans with identical parameters. Cycle number 1 means scan 1 and so forth. I would like to calculate the mean in the last column SEM c/s for each corresponding identical mass. in the end i would like to have a new data frame containing only:
ms "mass amu" "SEM c/s(mean over all cycles)"
obviously the mean of the mass does not need to be calculated. I would like to avoid to read each cycle into a new dataframe as this would mean I have to look up the length of each Mass spectrum . The "mass range" and " resolution" is obviously different for different measurements (Solution). I guess doing the calculation in numpy directly would be best but I am stuck?
Thank you in advance