1

I ran:

df_new['product_count']=df_new.product_count.astype(int)

And got this error message:

ValueError                                Traceback (most recent call last)
<ipython-input-210-d9dc69e3d064> in <module>()
----> 1 df_new['product_count']=df_new.product_count.astype(str).astype(int)

7 frames
/usr/local/lib/python3.7/dist-packages/pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: '100,000'
mozway
  • 194,879
  • 13
  • 39
  • 75
prabhas
  • 17
  • 2

1 Answers1

1

You can't directly convert numbers with thousands separators, you first need to remove them.

Use:

df_new['product_count'] = pd.to_numeric(df_new['product_count'].str.replace(',', ''), errors='coerce')

You can first omit the errors='coerce' option to see if you have errors, and which ones.

mozway
  • 194,879
  • 13
  • 39
  • 75
  • Replacing the comma may have sometime side effects. The locale package has a method to do the same. https://stackoverflow.com/questions/1779288/how-to-convert-a-string-to-a-number-if-it-has-commas-in-it-as-thousands-separato – Floh May 01 '22 at 07:49
  • @Floh the thing is that we have here a vector. Using the locale would mean applying on each element, which is inefficient. Here I suggested `to_numeric` that will drop all non number strings anyways (as we want to have homogeneous types in pandas), so, I don't really see a possible side effect (but I asked OP to first run the code without `errors='coerce'` to see what happens). – mozway May 01 '22 at 07:55
  • Oh right. The side effect I see would be if the french locale has been used, then comma is float separator not thousand. (So if OP is sure this is int, no issue, otherwise…) – Floh May 01 '22 at 07:57