-2

Currently struggling with converting this object data type into float. I'm not sure if this is a problem with my dataset or if I'm making a small mistake that I just don't see. Any help will be greatly appreciated! Please let me know if I can add any more information about this problem.

import pandas as pd

col_names = ['age', 'gender', 'coffee_bags_bought', 'spent_last_week', 'spent_last_month', 'income', 'online', 'new_product']

# load dataset
coffeeStore = pd.read_excel("/content/CoffeeStore.xlsx", header=None, names=col_names)
coffeeStore.head(5)

coffeeStore = coffeeStore.astype(float, errors = 'raise')

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-158-bcd1a762be43> in <module>()
      1 # converting object data types to integer
----> 2 coffeeStore = coffeeStore.astype(float, errors = 'raise')

7 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
   1199     if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
   1200         # Explicit copy, or required since NumPy can't view from / to object.
-> 1201         return arr.astype(dtype, copy=True)
   1202 
   1203     return arr.astype(dtype, copy=copy)

ValueError: could not convert string to float: 'age'
wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 2
    It looks like the XLSX contains a header row which you're explicitly treating as data with `header=None`, so the error is happening on the column labels. When you look at `coffeeStore.head(5)`, is that what you see? If that's not the problem, you need to make a [mre] including example input. See [How to make good reproducible pandas examples](/q/20109391/4518341) for specifics. You can [edit] your post. In any case, I'm voting to close the question for now. BTW, welcome to Stack Overflow! Check out the [tour], and [ask] if you want tips. – wjandrea May 05 '22 at 00:29

1 Answers1

0

Solution was removing col_names and header fields

Updated code to coffeeStore = pd.read_excel("/content/CoffeeStore.xlsx") and I was able to see the data type change from object to int64.

I was then able to execute coffeeStore = coffeeStore.astype(float, errors = 'raise') to convert objects into float