2

I have a data frame that looks like this.

0                                             1.144921                     
1                                             1.000000                     
2                                             1.119507                     
3                                                  inf                     
4                                             0.000000                     
5                                                  inf                     
6                                             0.000000                     
7                                             0.000000                     
8                                             1.000000                     
9                                             0.000000                     
10                                            0.000000                     
11                                            0.000000                     
12                                            1.793687                     
13                                                 inf    

I am trying to get rid of the 'inf' string. Basically, I just want to strip out all strings and keep only the numbers in the dataframe.

I tried the following code below.

kepler = re.sub("\D", "", kepler)
kepler = re.sub('[^0-9]','0', kepler)

When I run either of these lines of code I get the following error.

TypeError: expected string or bytes-like object

If I have a very simple string, it actually does work. So, this will work.

s = '83jjdmi239450  19dkd'
s = re.sub("\D", "", s)

Unfortunately, the code doesn't work on my dataframe. Any thoughts? Thanks.

efirvida
  • 4,592
  • 3
  • 42
  • 68
ASH
  • 20,759
  • 19
  • 87
  • 200

3 Answers3

2

With numpy.isfinite routine on sample dataframe:

In [176]: df
Out[176]: 
           a
0   1.000000
1   1.119507
2        inf
3   0.000000
4        inf
5   0.000000
6   0.000000
7   1.000000
8   0.000000
9   0.000000
10  0.000000
11  1.793687
12       inf

In [177]: df = df[~np.isinf(df['a'])]

In [178]: df
Out[178]: 
           a
0   1.000000
1   1.119507
3   0.000000
5   0.000000
6   0.000000
7   1.000000
8   0.000000
9   0.000000
10  0.000000
11  1.793687
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

Try

df = pd.read_clipboard()
df.columns = ['col1','col2']
df

    col1    col2
0   1   1.000000
1   2   1.119507
2   3   inf
3   4   0.000000
4   5   inf
5   6   0.000000
6   7   0.000000
7   8   1.000000
8   9   0.000000
9   10  0.000000
10  11  0.000000
11  12  1.793687
12  13  inf

df.col2[df.col2 < np.inf]
0     1.000000
1     1.119507
3     0.000000
5     0.000000
6     0.000000
7     1.000000
8     0.000000
9     0.000000
10    0.000000
11    1.793687
MichaelD
  • 1,274
  • 1
  • 10
  • 16
1

I am trying to get rid of the 'inf' string.

You describe it as a string, but that's just the printed representation of a 64-bit floating point number.

TypeError: expected string or bytes-like object

You can't hand a float into a regex operation, as a regex needs a string.

Instead, turn the infinite quantities into NaNs, and drop them:

rows = [dict(x=1.79),
        dict(x=math.inf)]
df = pd.DataFrame(rows).replace([np.inf, -np.inf], np.nan)
df = df.dropna()
J_H
  • 17,926
  • 4
  • 24
  • 44