I have a CSV file with lines look like:
ID,98.4,100M,55M,65M,75M,100M,75M,65M,100M,98M,100M,100M,92M,0#,0N#,
I can read it in with
#!/usr/bin/env python
import pandas as pd
import sys
filename = sys.argv[1]
df = pd.read_csv(filename)
Given a particular column, I would like to split the rows by ID and then output the mean and standard deviation for each ID.
My first problem is, how can I remove all the non-numeric parts from the numbers such as "100M" and "0N#" which should be 100 and 0 respectively.
I also tried looping over the relevant headers and using
df[header].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
as suggested in Pandas DataFrame: remove unwanted parts from strings in a column .
However this changes 98.4 into 984.