How do I replace all string values with NaN (Dynamically)?

Question

I want to find all the strings in my dataframe and I want to replace them with NaN values so that I can drop all associated NaN values with the function df.dropna(). For example, if I have the following data set:

x = np.array([1,2,np.NaN,4,5,6,7,8,9,10])
z = np.array([1,2,np.NaN,4,5,np.NaN,7,8,9,"My Name is Jeff"])
y = np.array(["Hello World",2,3,4,5,6,7,8,9,10])

I should first be able to dynamically replace all strings with np.nan so my output should be:

x = np.array([1,2,np.NaN,4,5,6,7,8,9,10])
z = np.array([1,2,np.NaN,4,5,np.NaN,7,8,9,np.NaN])
y = np.array([np.NaN,2,3,4,5,6,7,8,9,10])

and then running df.dropna() (Assume that x,y,z reside in a data frame and not just separate variables) should allow me to have:

x = np.array([2,4,5,7,8,9])
z = np.array([2,4,5,7,8,9])
y = np.array([2,4,5,7,8,9])

The dtypes of the first definitions are `float` and `string`. in the second, all `float`. Then `int`. In pandas columns with strings will be `object`. I think the `nan` columns will still be float, but may be object. If you are starting with a dataframe, I'd suggest defining/showing that rather than numpy arrays. — hpaulj, Jul 16 '19 at 01:07

score 3 · Accepted Answer · answered Jul 16 '19 at 00:27

3

Since you tag pandas

pd.to_numeric(x,errors='coerce')

answered Jul 16 '19 at 00:27

BENY

317,841
20
164
234

What does this do? Can you describe its function so I can better understand its use case. – Zakariah Siyaji Jul 16 '19 at 00:28
1

@QariZakariahSiyaji this function will push everything can be the number to numeric other that that will convert to NaN – BENY Jul 16 '19 at 00:36
The problem is that this is a dataframe while pd.to_numeric works for a Pandas Series – Zakariah Siyaji Jul 16 '19 at 00:39
1

[That's a common question'](https://stackoverflow.com/questions/36814100/pandas-to-numeric-for-multiple-columns) – rafaelc Jul 16 '19 at 00:41
@QariZakariahSiyaji `df=df.apply(pd. to_numeric, errors='coerce').dropna()` – BENY Jul 16 '19 at 01:32

score 1 · Answer 2 · answered Jul 16 '19 at 09:41

1

Please find the following:

df = pd.DataFrame([x, y, z])

def Replace(i):
    try:
        float(i)
        return float(i)
    except:
           return np.nan

df = df.applymap(func=Replace)
df.dropna(axis=1)

answered Jul 16 '19 at 09:41

Shiva

33
5

score 0 · Answer 3 · answered Jul 16 '19 at 00:29

0

This works I think:

df = pd.DataFrame(data={'A':[1,2,'str'],'B':['name',2,2]})
for column in df.columns:
    df[column]=df[column].apply(lambda x:np.nan if type(x)==str else x)
print(df)

answered Jul 16 '19 at 00:29

Parijat Bhatt

664
4
6

That'd work but would be extremely slow. `pd.to_numeric` is preferred ! Also you could use just `df.applymap` with same lambda, no need for iterating and assigning manually – rafaelc Jul 16 '19 at 00:37
Could you please show me how to apply this to code. The problem that I am running into is that pd.numeric works for a Pandas Series while I am working with a data frame. – Zakariah Siyaji Jul 16 '19 at 00:41

score 0 · Answer 4 · answered Jul 16 '19 at 06:46

I think the following is the simplest rendition: The function called "cleanData" takes in a file as an argument and an array of columns that you may want to ignore. It will then replace all of the strings in the file with NaN values and then it will drop those NaN values.

def cleanData(file, ignore=[]):
    for column in file.columns:
        if len(ignore) is not 0:
            if column not in ignore:
                file[column] = file[column].apply(pd.to_numeric, errors='coerce')
        else:
            file[column] = file[column].apply(pd.to_numeric, errors='coerce')
    file = file.dropna()
    return file

How do I replace all string values with NaN (Dynamically)?

4 Answers4