0

Basically a column in my data frame has lat and long in integers but some entries have "not applicable". Due to this, the column has dtype of object. I want pandas to read the "not applicable" as null so i can treat the column like an integer series.

I haven't been able to find anything online so far but I recall having read something about this online.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 2
    Please add an example of the input data, so we could reproduce the problem. – Maria K Jun 27 '23 at 14:03
  • If you already have a dataframe, just `replace()` those values with `nan` (pandas doesn't use `NULL` per se) and then optionally change the dtype? But integers don't support `nan` either, so maybe you want `float`? Or... https://pandas.pydata.org/docs/user_guide/integer_na.html – MatBailie Jun 27 '23 at 14:05
  • 1
    Does this answer your question: https://stackoverflow.com/questions/11548005/numpy-or-pandas-keeping-array-type-as-integer-while-having-a-nan-value – MatBailie Jun 27 '23 at 14:15
  • 1
    Example data would be helpful. Lat / long in integer degrees sounds odd because that would be extremely imprecise. Why do you need integers anyway? – Joooeey Jun 27 '23 at 15:02

1 Answers1

1

If you're dealing with data from a file, you can check the na_values argument in the load functions (example with pd.read_csv())

df = pd.read_csv('examplefile.csv', na_values= 'not applicable')

This will tell pandas to treat every single cell that has 'not applicable' in it as a nan.

If your dataframe is something you've generated inside the program, you can change the values where 'not applicable' appears with pd.replace()

This will change the value you tell it to another value you tell the function:

df.replace('not applicable', np.nan, inplace=True)

The inplace argument makes it so the dataframe gets updated, instead of returning it. if you don't want to use it, you can do df = df.replace('not applicable', np.nan)

Here, I've used numpy's NaN as the replacement, but you can tell it to replace with any value you want.

For changing the data type, you can use pd.astype()

  • The op wants the dtype to be an integer and not an Object (or, implicitly, a float). This doesn't fix that. – MatBailie Jun 27 '23 at 14:11
  • you can do [pd.astype()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html) if you need, as far as I've seen, it's not needed – David Siret Marqués Jun 27 '23 at 14:15
  • Your answer will retain the dtype of Object, the op *expressly* states they want to change the dtype. And `int` can't contain `nan`. – MatBailie Jun 27 '23 at 14:17
  • Again, I've used it a bunch of times and it changes the dtype acordingly, also you can use the astype as I said... – David Siret Marqués Jun 27 '23 at 14:33
  • 2
    @MatBailie "or, implicitly, a float" -- Where did you get that from? OP said "treat the column like an integer series", and floats are "like" ints. And there won't be any massive numbers that could cause loss of precision because the range of lat and lon are small. – wjandrea Jun 27 '23 at 14:57
  • `int` dtype ***can not*** contain a NaN – MatBailie Jun 27 '23 at 15:31
  • Why so obsessed with that? lat and long usually are given with a ton of decimals, meaning they would be floats. – David Siret Marqués Jun 27 '23 at 15:39