Can't seem to strip numbers from a string

Question

I have a data frame that looks like this.

0                                             1.144921                     
1                                             1.000000                     
2                                             1.119507                     
3                                                  inf                     
4                                             0.000000                     
5                                                  inf                     
6                                             0.000000                     
7                                             0.000000                     
8                                             1.000000                     
9                                             0.000000                     
10                                            0.000000                     
11                                            0.000000                     
12                                            1.793687                     
13                                                 inf

I am trying to get rid of the 'inf' string. Basically, I just want to strip out all strings and keep only the numbers in the dataframe.

I tried the following code below.

kepler = re.sub("\D", "", kepler)
kepler = re.sub('[^0-9]','0', kepler)

When I run either of these lines of code I get the following error.

TypeError: expected string or bytes-like object

If I have a very simple string, it actually does work. So, this will work.

s = '83jjdmi239450  19dkd'
s = re.sub("\D", "", s)

Unfortunately, the code doesn't work on my dataframe. Any thoughts? Thanks.

Try ```kepler = re.sub("\D", "", kepler) if type(kepler) == 'str' else kepler``` — Michael Bianconi, Jul 05 '19 at 17:23
Yes, I am reading data from a CSV file. kepler = pd.read_csv(file) — ASH, Jul 05 '19 at 17:23
are you looking for df[df[0].apply(lambda x: type(x) != str)] — Chris, Jul 05 '19 at 17:26
Does this answer your question? [dropping infinite values from dataframes in pandas?](https://stackoverflow.com/questions/17477979/dropping-infinite-values-from-dataframes-in-pandas) — AMC, Feb 08 '20 at 01:31

score 2 · Accepted Answer · answered Jul 05 '19 at 17:28

With numpy.isfinite routine on sample dataframe:

In [176]: df
Out[176]: 
           a
0   1.000000
1   1.119507
2        inf
3   0.000000
4        inf
5   0.000000
6   0.000000
7   1.000000
8   0.000000
9   0.000000
10  0.000000
11  1.793687
12       inf

In [177]: df = df[~np.isinf(df['a'])]

In [178]: df
Out[178]: 
           a
0   1.000000
1   1.119507
3   0.000000
5   0.000000
6   0.000000
7   1.000000
8   0.000000
9   0.000000
10  0.000000
11  1.793687

score 1 · Answer 2 · answered Jul 05 '19 at 17:29

Try

df = pd.read_clipboard()
df.columns = ['col1','col2']
df

    col1    col2
0   1   1.000000
1   2   1.119507
2   3   inf
3   4   0.000000
4   5   inf
5   6   0.000000
6   7   0.000000
7   8   1.000000
8   9   0.000000
9   10  0.000000
10  11  0.000000
11  12  1.793687
12  13  inf

df.col2[df.col2 < np.inf]
0     1.000000
1     1.119507
3     0.000000
5     0.000000
6     0.000000
7     1.000000
8     0.000000
9     0.000000
10    0.000000
11    1.793687

score 1 · Answer 3 · answered Jul 05 '19 at 17:29

I am trying to get rid of the 'inf' string.

You describe it as a string, but that's just the printed representation of a 64-bit floating point number.

TypeError: expected string or bytes-like object

You can't hand a float into a regex operation, as a regex needs a string.

Instead, turn the infinite quantities into NaNs, and drop them:

rows = [dict(x=1.79),
        dict(x=math.inf)]
df = pd.DataFrame(rows).replace([np.inf, -np.inf], np.nan)
df = df.dropna()

Ok. Got it working. Thanks everyone. – ASH Jul 05 '19 at 17:53 — ASH, Jul 05 '19 at 17:53

Can't seem to strip numbers from a string

3 Answers3