I can't use python's "replace" to make my 0 a missing value (0->np.nan)

Question

I used pandas to read my csv file from the cloud, I used replace() and wanted 0 to become a missing value, but it doesn't seem to work.

I use Google's colab

I tried two methods:

user_data = user_data.replace(0,np.nan) # first 
user_data.replace(0,np.nan,inplace = True) # second

user_data.head() # I use this to view the data.

But the data is the same as when I first read it, 0 has no change

Here is the function I read the file, I use the block method

# Read function
def get_df2(file):
    mydata2 = []
    for chunk in pd.read_csv(file,chunksize=500000,header = None,sep='\t'):
        mydata2.append(chunk)
    user_data = pd.concat(mydata2,axis=0)
    names2=['user_id','age','gender','area','status']
    user_data.columns = names2
    return user_data

# read
user_data_path = 'a_url'
user_data = get_df2(user_data_path)
user_data.head()

Note: my code doesn't report an error, it outputs the result, but that's not what I want

Please check if your 0 is a number or a string. I once had the same problem and the column was a string column not a number. so replace won't work. — Martin, Apr 29 '19 at 11:48
Try replacing `0` with `"0"`, because I think your `0` might be a string — funie200, Apr 29 '19 at 11:49
Oh, thank you very much, it solved my problem. This problem has been bothering me for a long time. I didn't pay attention to the limitation of strings before. — 罗文浩, Apr 29 '19 at 12:07

score 1 · Answer 1 · answered Apr 29 '19 at 11:58

1

Your 0s are probably just strings, try using:

user_data = user_data.replace('0', np.nan)

answered Apr 29 '19 at 11:58

ruohola

21,987
6
62
97

Nauman Naeem · Answer 2 · 2019-04-30T07:52:31.587

Python can get irritating under such scenarios.

As pointed out earlier, it is probably because of 0 being a string and not an integer. which can be catered by

user_data.replace("0",np.nan,inplace = True)

But, I wanted to point out, in scenarios where you know what kind of data should be in a column in a pandas dataframe, you should explicitly set it to that type, that way, whenever there is such a scenario an error will be raised and you will know exactly where the problem is.

In your case, columns are:

names2=['user_id','age','gender','area','status']

Let's assume

user_id is string
age is integer
gender is string
area is string
status is string

You can tell pandas which column is supposed to be which datatype by

user_data = userdata.astype({"user": str, "age": integer, "gender": str, "area": str, "status": str})

There are many other ways to do that, as mentioned in the following answer. Choose whichever suits you or your needs.

Thank you for your suggestion, I have some gains, I will pay attention to this aspect in the future. — 罗文浩, Apr 29 '19 at 15:25

I can't use python's "replace" to make my 0 a missing value (0->np.nan)

2 Answers2