I'm trying to currently clean this dataset but with no luck. This is the initial code I used:
import pandas as pd
Location =r'file.rpt'
df = pd.read_fwf(Location, delim_whitespace=True)
df=df.iloc[12:] #need to optimise this
df2 = df.rename({'********************************************************************************': 'Nodes'}, axis=1)
df2[['Nodes', 'disp1','disp2','disp3']] = df2['Nodes'].str.split(n=3, expand=True)
Output:
Node disp1 disp2 disp3
12 1001 0. 0. 0.
13 1002 0. 0. 0.
14 1003 0. 0. 0.
15 1004 0. 0. -10.0E-3
16 1005 0. -- -9.0E-3
17 1006 0. te 0.
My main issue - I'm trying to find a way to make each column only contain numbers. I tried this solution to remove non numerical characters:
df2=df2[pd.to_numeric(df2['Nodes'], errors='coerce').notnull()]
df2=df2[pd.to_numeric(df2['IP'], errors='coerce').notnull()]
df2=df2[pd.to_numeric(df2['ESF'], errors='coerce').notnull()]
However what it did was remove 'E' from the cells (scientific notation) and kept a bunch of other stuff such as '--' and non numbers. Is there any solution to:
Check a column whether or not its a number (scientific numbers ALLOWED) - if it isn't, make it blank.
I'm beginning to think it's something to do with the fact when I split the data I used 'str.split' which makes it a string and not a float/integer... i don't know. Any help appreciated.