1

I created a random dataFrame simulating the dataset tips from seaborn:

import numpy as np
import pandas as pd

time = ['day','night']
sex = ['female','male']
smoker = ['yes','no']
for t in range(0,len(time)):
    for s in range(0,len(sex)):
        for sm in range(0,len(smoker)):
            randomarray = np.random.rand(10)*10
            if t == 0 and s == 0 and sm == 0:
                df = pd.DataFrame(index=np.arange(0,len(randomarray)),columns=["total_bill","time","sex","smoker"])
                L = 0
                for i in range(0,len(randomarray)):
                    df.loc[i] = [randomarray[i], time[t], sex[s], smoker[sm]]
                    L = L + 1
            else:
                for i in range(0,len(randomarray)):
                    df.loc[i+L] = [randomarray[i], time[t], sex[s], smoker[sm]]
                    L = L + 1

My dataFrame df has, for each column, the same type of class as the dataFrame tips from seaborn's dataset:

tips = sns.load_dataset("tips")
type(tips["total_bill"][0])
type(tips["time"][0])

numpy.float64

str

And so on for the other columns. Same as my dataFrame:

type(df["total_bill"][0])
type(tips["time"][0])

numpy.float64

str

However, when I try to use seaborn's violinplot or factorplot following the documentation:

g = sns.factorplot(x="sex", y="total_bill", hue="smoker", col="time",  data=df, kind="violin", split=True, size=4, aspect=.7);

I have no problems if I use the dataFrame tips, but when I use my dataFrame I get:

AttributeError: 'float' object has no attribute 'shape'

I Imagine this is an issue with the way I pass the array into the dataFrame, but I couldn't find what is the problem since every issue I found on the internet with the same AttributeError says it's because it's not the same type of class, and as shown above my dataFrame has the same type of class as the one in seaborn's documentation.

Any suggestions?

lanadaquenada
  • 395
  • 3
  • 4
  • 26

4 Answers4

8

I got the same problem and was trying to find a solution but did not see the answer I was looking for. So I guess provide an answer here may help people like me.

The problem here is that the type of df.total_bill is object instead of float.

So the solution is to change it to float befor pass the dataframe to seaborn:

df.total_bill = df.total_bill.astype(float)
digdug
  • 459
  • 5
  • 8
  • Yes, I had the same problem when trying to do a violinplot on a different dataframe, and the problem went away as soon as I explicitly defined the dtype of the column I was trying to use as my y column as a float. So I think this answer is actually the answer to this question. – Emily Beth Nov 23 '18 at 20:00
  • 1
    you can also use conversion by `pd.to_numeric(...)` https://stackoverflow.com/a/28648923/4521646 – Jirka Mar 26 '19 at 21:08
1

This is a rather unusual way of creating a dataframe. The resulting dataframe also has some very strange properties, e.g. it has a length of 50 but the last index is 88. I'm not going into debugging these nested loops. Instead, I would propose to create the dataframe from some numpy array, e.g. like

import numpy as np
import pandas as pd

time = ['day','night']
sex = ['female','male']
smoker = ['yes','no']

data = np.repeat(np.stack(np.meshgrid(time, sex, smoker), -1).reshape(-1,3), 10, axis=0)
df = pd.DataFrame(data, columns=["time","sex","smoker"])
df["total_bill"] = np.random.rand(len(df))*10

Then also plotting works fine:

g = sns.factorplot(x="sex", y="total_bill", hue="smoker", col="time",  data=df, 
                   kind="violin", size=4, aspect=.7)

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • I know it's a very unusual way of creating a dataFrame. Maybe I should have clarified that I'm creating mine this way because I need to have 4 loops and inside each loop I do some calculations according to the parameters of each loop. So I need to append the data (which I also don't know the shape it's going to have) to the final dataFrame. There are probably better ways to do it, but with my knowledge this is the best I could do. The question is still the same. Why do I get this error when the data I'm passing is the same type of data as the one in the example? – lanadaquenada Apr 27 '18 at 14:51
  • Because you are overwriting some of the data instead of appending it. It's an easy check: You would expect to see 2*3*10 = 80 rows in the dataframe, yet it only has 50. If you have a problem creating a dataframe please ask about that and not about plotting it. In any case, even if you want to use nested loops, I would still suggest you first create a list to which you *append* the rows instead of indexing an existing dataframe. Once that list is created, create a DataFrame from it. – ImportanceOfBeingErnest Apr 27 '18 at 23:26
  • Thank you, I asked about the plot because I didn't realize my problem was with the DataFrame. I did what you explained here and it worked just fine! – lanadaquenada May 02 '18 at 14:16
  • I believe that the answer id the formatting of DataFrame, the whole column has not to be an object, see `pd.DataFrame().info()` details and use conversion by `pd.to_numeric(...)` https://stackoverflow.com/a/28648923/4521646 – Jirka Mar 26 '19 at 21:06
0

Convert the data type of your variable from object to say float/int.

AnksG
  • 488
  • 4
  • 9
-1

I had a different issue in my code that produced the same error:

'str' object has no attribute 'get'

For me, I had in my seaborn syntax ...data='df'... where df is an object, however, and should not be in quotes. Once I removed the quotes, my program worked perfectly. I made the mistake, as someone else might, because the x= and y= parameters are in quotes (for the columns in the dataframe)

Hein Wessels
  • 937
  • 5
  • 15