Pandas: calculate mean for each row by cycle number

Question

I have a CSV file (Mspec Data) which looks like this:

#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452;     1,00;       620
0000000001;00:00:01;0000001452;     1,20;      4730
0000000001;00:00:01;0000001452;     1,40;      4610
...       ;..:..:..;..........;.........;...........

I read it via:

 df = pd.read_csv(Filename, header=30,delimiter=';',decimal= ',' )

the result looks like this:

      Cycle      Time      ms  mass amu  SEM c/s
0         1  00:00:01    1452       1.0      620
1         1  00:00:01    1452       1.2     4730
2         1  00:00:01    1452       1.4     4610
...     ...       ...     ...       ...      ...
3872      4  00:06:30  390971       1.0    32290
3873      4  00:06:30  390971       1.2    31510

This data contains several Mass spec scans with identical parameters. Cycle number 1 means scan 1 and so forth. I would like to calculate the mean in the last column SEM c/s for each corresponding identical mass. in the end i would like to have a new data frame containing only:

ms  "mass amu"  "SEM c/s(mean over all cycles)"

obviously the mean of the mass does not need to be calculated. I would like to avoid to read each cycle into a new dataframe as this would mean I have to look up the length of each Mass spectrum . The "mass range" and " resolution" is obviously different for different measurements (Solution). I guess doing the calculation in numpy directly would be best but I am stuck?

Thank you in advance

use [groupby and aggregate](https://pandas.pydata.org/pandas-docs/stable/groupby.html) with function mean — joaquin, Jun 15 '18 at 09:48

score 0 · Accepted Answer · answered Jun 15 '18 at 09:47

0

You can use groupby(), something like this:

df.groupby(['ms', 'mass amu'])['SEM c/s'].mean()

answered Jun 15 '18 at 09:47

John Zwinck

239,568
38
324
436

Thanks ! but only df.groupby(['mass amu'])['SEM c/s'].mean() wokred. Is there any reason for this. How would i actually do this in numpy? – NorrinRadd Jun 15 '18 at 11:38

score 0 · Answer 2 · answered Jun 15 '18 at 10:01

You have different ms over all the cycles, and you want to calculate the mean of SEM over each group of same ms.
I will show you a step-by-step example.
You should invoke each group and then put the mean in a dictionary to convert in DataFrame.

ms_uni = df['ms'].unique()  #calculate the unique ms values
new_df_dict = { "ma":[],  "SEM":[] } #later you will rename them

for un in range( len(ms_uni) ):
    cms = ms_uni[un]
    new_df_dict['ma'].append( cms )
    new_df_dict['SEM'].append( df[ df['ms']==cms ]['SEM c/s'].mean() ) #advise: change the column name in a more safe SEM-c_s

new_df = pd.DataFrame(new_df_dict) #end of the dirty work
new_df.rename(index=str, columns={'ma':"mass amu", "SEM": "SEM c/s(mean over all cycles)"} )

Hope it will be helpful

Pandas: calculate mean for each row by cycle number

2 Answers2