1

i have a 880184*1 dataframe, the only column is either integer object or string object. I want to change all string object to number 0. It looks like below:

index               column
.....               ......
23155     WILLS ST / MIDDLE POINT RD
23156                          20323
23157    400 Block of BELLA VISTA WY
23158                          19090
23159     100 Block of SAN BENITO WY
23160                          20474

Now the problem is both number and string are 'object' type, I don't know how to change the string like object to 0 like below:

index                          column
.....                          ......
23155                            0
23156                          20323
23157                            0
23158                          19090
23159                            0
23160                          20474

Another problem is that the sample size is too large, making it too long to use for loops to fix row by row. I want to use something like:

df.loc[df.column == ...] = 0
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Tom Dawn
  • 185
  • 2
  • 3
  • 14
  • Welcome to StackOverflow! This is not a coding site where you can ask others to do work for you. It is a Q/A site which strives to gather lots of questions more than one person could have and helpful answers to them. You have to show your efforts, your non-working code and formulate a clear question on what you want us to answer. – Alfe Apr 21 '16 at 15:24
  • Hi sorry, I don't mean to let others work for me. I am just stuck at this point and don't know how to solve it. – Tom Dawn Apr 21 '16 at 15:39

2 Answers2

1

You can convert the type to numeric with pd.to_numeric and pass errors='coerce' so that you would get NaN for the ones cannot be converted to numbers. In the end, you can replace the NaNs with zero:

df["column"] = pd.to_numeric(df["column"], errors="coerce").fillna(0)
Out[15]: 
0        0.0
1    20323.0
2        0.0
3    19090.0
4        0.0
5    20474.0
Name: column, dtype: float64

If you want the integer values, add astype('int64') to the end:

df["column"] = pd.to_numeric(df["column"], errors="coerce").fillna(0).astype("int64")
Out[16]: 
0        0
1    20323
2        0
3    19090
4        0
5    20474
Name: column, dtype: int64
ayhan
  • 70,170
  • 20
  • 182
  • 203
-1

try converting everything to integers using the int() function. The strings cannot be converted so an error is raised. Pack this in a "try" loop and you are set.

Like this:

def converter(currentRowObj):
    try:
        obj = int(currentRowObj)
    except:
        obj = 0
    return obj
Ma0
  • 15,057
  • 4
  • 35
  • 65
  • [A bare `except` is bad practice](/q/54948548/4518341). Instead, use the specific exception you're expecting like `except ValueError`, or at least `except Exception`. – wjandrea May 15 '22 at 16:27
  • How would you apply this to the df? `df['column'].apply(converter)`? In that case, `currentRowObj` isn't a row but a value, so I'd call it something else. – wjandrea May 15 '22 at 16:36