-3

I am trying to get a percentage by dividing the numbers from one column with another column but i keep getting the same error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-60166e8a919c> in <module>()
      6 dataLake = dataLake[['day','Agent','Resolved','Meta','Week','Year']]
      7 #Creating new data (atingimento)
----> 8 dataLake["atingimento"] = ((dataLake.Resolved.astype(int) / dataLake.Meta.astype(int)) * 100)
      9 dataLake['Resolved'] = dataLake.Resolved.astype(int)
     10 dataLake['Meta'] = dataLake.Meta.astype(str)

4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    972         # work around NumPy brokenness, #1987
    973         if np.issubdtype(dtype.type, np.integer):
--> 974             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
    975 
    976         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: ''

I tried converting both data sets to int using .astype(int) but it does not work as you can see from the data set below some how the google colab is reading the column 'Meta' as string even though its in the same format as the column Resolved.

           day  |             Agent | Resolved |   Meta |Week | Year
-------------------------------------------------------------------------
103 2021-01-26  |   Ana Carolina B. |     107  |2525252525    4  2021
104 2021-01-25  |       Bárbara D.  |   275    |3831252128    4  2021
105 2021-01-25  |          Danielly |   192    |3831252128    4  2021
106 2021-01-26  |   Felipe Pereira  | 102      |3125212822    4  2021
107 2021-01-26  |Fernanda Favalessa |207       |3125212822    4  2021
108 2021-01-25  |           Guto R. |215       |3831252114    4  2021
109 2021-01-25  |        Helaine S. |   253    |  3831252114    4  2021
110 2021-01-25  |           João M. |   145    |   38252128    4  2021
111 2021-01-25  |           João P. |    173   | 3535353535    4  2021
112 2021-01-26  |     Livia Azeredo |     89   |3125212822    4  2021
113 2021-01-26  |       Lucas Alves |     70   |1815101320    4  2021
114 2021-01-25            Paula P.  |    137   |3831252114    4  2021
mozway
  • 194,879
  • 13
  • 39
  • 75
Isaac Portes
  • 57
  • 1
  • 1
  • 4
  • 1
    `invalid literal for int() with base 10: ''` What does `''` might refer to? Hint: There is an empty string in one of the columns. An empty string can not be converted to an integer. – DeepSpace Feb 01 '22 at 12:10
  • [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/4046632) – buran Feb 01 '22 at 12:15

1 Answers1

0

You might want to use pandas.to_numeric that can convert the invalid data to NaN (and then fillna with a default value if needed):

in place of:

dataLake.Resolved.astype(int)

Use:

pd.to_numeric(dataLak['Resolved'], errors='coerce')
# or
pd.to_numeric(dataLak['Resolved'], errors='coerce').fillna(-1) # -1 if invalid

etc. for all other occurrences

Example:

pd.to_numeric(pd.Series(['1', '   12  ', '']), errors='coerce')

output:

0     1.0
1    12.0
2     NaN
dtype: float64
mozway
  • 194,879
  • 13
  • 39
  • 75