-1

I'm trying to covert "Quantity" column to int.

The quantity column has a string(,) divider or separator for the numerical values

using code

data['Quantity'] = data['Quantity'].astype('int')
data['Quantity'] = data['Quantity'].astype('float')

I am getting this error:

ValueError: could not convert string to float: '16,000'

ValueError: invalid literal for int() with base 10: '16,000'

Data

Date         Quantity

2019-06-25       200
2019-03-30       100
2019-11-02       250
2018-10-23       100
2018-07-17       150
2018-05-31       150
2018-07-05       100
2018-10-04       100
2018-02-23       100
2019-09-16       204
2019-09-16       315
2019-11-09       113
2019-08-29         5
2019-08-23         4
2019-06-18        78
2019-12-06         4
2019-12-06         2
2019-10-03    16,000
2019-07-03     8,000
2018-12-12        32

Name: Quantity, dtype: object

It's a pandas dataframe with 124964 rows. I added the head and tail of the data

What can I do to fix this problem?

  • Is this a pandas dataframe? Also, shouldn't there be `16,000` somewhere in your data? – Aran-Fey Sep 17 '19 at 08:01
  • 1
    Sorry for closing it immediately, I have too much reputation. I wonder if https://stackoverflow.com/questions/1779288/how-to-convert-a-string-to-a-number-if-it-has-commas-in-it-as-thousands-separato would solve your problem, but it doesn't involve a Pandas dataframe. But the use of `locale` is probably key. Because in the Netherlands 16,000 would mean 16, but in the US it means 16000. – RemcoGerlich Sep 17 '19 at 08:04
  • It's a pandas dataframe with 124964 rows. I added the head and tail of the data –  Sep 17 '19 at 08:07
  • What are you creating the dataframe from? csv? – Trenton McKinney Sep 24 '19 at 06:28

4 Answers4

1

Solution

# Replace string "," with ""

data["Quantity"] = data["Quantity"].apply(lambda x: str(x.replace(',','')))

data['Quantity'] = data['Quantity'].astype('float')
0
number = '16,000'
act_num = ''
for char in number:
    try:
        character = int(char)
        act_num+=(char)
    except:
        if char == '-' or char == '.':
            act_num+= (char)
print(float(act_num))
Sid
  • 2,174
  • 1
  • 13
  • 29
  • But that treats 16,000, -16000, 16.000 all as the same – RemcoGerlich Sep 17 '19 at 08:02
  • @RemcoGerlich I have edited the answer. Tried with all your examples and it works – Sid Sep 17 '19 at 08:10
  • @Krishna does it work? I haven't included the part where you loop through the dataframe but I trust you can do that – Sid Sep 17 '19 at 08:13
  • @Sid I'm unable to loop through the entire dataframe. Maybe I don't know how to do it. Please go through the updated question and let me know how to fix it –  Sep 17 '19 at 10:30
  • @Krishna sorry but I don't know how to work in pandas – Sid Sep 17 '19 at 10:39
  • But maybe this will help you - https://stackoverflow.com/questions/34878573/how-to-loop-through-a-pandas-dataframe – Sid Sep 17 '19 at 10:41
0

'16,000' is neither a valid representation of an int or float, and actually the format is ambiguous - depending on locale standard, it could mean either 16.0 (float) or 16000 (int).

You first need to specify how this data should be interpreted, then fix the string so that it's a valid representation of either a float or int, then apply asType() with the correct type.

To make '16,000' a valid float representation, you just have to replace the comma with a dot:

val = '16,000'
val = val.replace(",", ".")

To make it an int (with value 16000) you just remove the comma:

val = '16,000'
val = val.replace(",", "")

I don't use panda so I can't tell how to best do this with a dataframe, but this is surely documented.

As a general rule: when working on data coming from the wild outside world (anything outside your own code), never trust the data, always make sure you validate and sanitize it before use.

bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
-2

data.Quantity = data.Quantity.astype(str).astype(int)

  • 1
    There are decimal points as well, you will have to use ```float``` not ```int``` – Sid Sep 17 '19 at 08:03
  • So use float instead int – Shashwat Hirapure Sep 17 '19 at 08:04
  • @ShashwatHirapure welcome to SO and thanks for trying to help. Unfortunately, your answer is wrong (the data is already a string and this string cannot be turned into an integer), and doesn't match SO standards for a good answer (a good answer should not only provide a working solution, but also explain the solution). – bruno desthuilliers Sep 17 '19 at 09:40