I am trying to clean my data to do some analysis. My data (.csv
) are outputs of some experiment, so same format or words repeated in each column. I add an image of my original data. In each of my four columns I have::
df = pd.DataFrame({'rev45s':['Area is 389.62 km^2','aspArea is 76.61 km^2','asp_Ave_slip is 1.59 m','Mw is 5.5'],
'rev45':['Area is 589.32 km^2','aspArea is 66.65 km^2','asp_Ave_slip is 3.69 m','Mw is 6.1'],
'SS45':['Area is 319.62 km^2','aspArea is 61.71 km^2','asp_Ave_slip is 3.09 m','Mw is 6.8'],
'SS45s':['Area is 489.52 km^2','aspArea is 54.61 km^2','asp_Ave_slip is 1.44 m','Mw is 9.5']})
I need to make a new Dataframe
with
first column as "parameter"=(Area, aspArea, asp_Ave_slip, Mw )
second to forth columns as
"rev45s_value","rev45_value","SS45_value","SS45s_value" and thefifth column as "unit"=(km^2, km^2, m, -)
I tried some code like:
df['rev45s']=df['rev45s'].apply(lambda x: pd.Series(x.split()))
and
df['rev45s']=df['rev45s'].str.split(' ')
but they didn't work. How can I clean this DataFrame?