0

I'm trying to currently clean this dataset but with no luck. This is the initial code I used:

import pandas as pd
Location =r'file.rpt'
df = pd.read_fwf(Location, delim_whitespace=True)
df=df.iloc[12:] #need to optimise this
df2 = df.rename({'********************************************************************************': 'Nodes'}, axis=1) 
df2[['Nodes', 'disp1','disp2','disp3']] = df2['Nodes'].str.split(n=3, expand=True)

Output:

    Node disp1 disp2 disp3
12  1001    0.  0.  0.
13  1002    0.  0.  0.
14  1003    0.  0.  0.
15  1004    0.  0.  -10.0E-3
16  1005    0.  --  -9.0E-3
17  1006    0.  te  0.

My main issue - I'm trying to find a way to make each column only contain numbers. I tried this solution to remove non numerical characters:

df2=df2[pd.to_numeric(df2['Nodes'], errors='coerce').notnull()]
df2=df2[pd.to_numeric(df2['IP'], errors='coerce').notnull()]
df2=df2[pd.to_numeric(df2['ESF'], errors='coerce').notnull()]

However what it did was remove 'E' from the cells (scientific notation) and kept a bunch of other stuff such as '--' and non numbers. Is there any solution to:

Check a column whether or not its a number (scientific numbers ALLOWED) - if it isn't, make it blank.

I'm beginning to think it's something to do with the fact when I split the data I used 'str.split' which makes it a string and not a float/integer... i don't know. Any help appreciated.

apang
  • 93
  • 1
  • 12

1 Answers1

0

As described here, the solution is:

df2 = df2.apply(pd.to_numeric, errors='coerce')

Afterwards you have NaN values in your dataframe. That is the blank in pandas.

above_c_level
  • 3,579
  • 3
  • 22
  • 37