21

I am new to pandas and bokeh and I am trying to create a scatter plot from a pandas dataframe. However, I keep getting the following error:

new_data[colname] = df[colname].tolist()
AttributeError: 'DataFrame' object has no attribute 'tolist' 

Using the dummy data from bokeh (from bokeh.sampledata.iris import flowers as data) the scatter works fine.

   type   tsneX      tsneY      +50.000 columns
0  A      53.828863  20.740931  
1  B      57.816909  18.478468  
2  A      55.913429  22.948167  
3  C      56.603005  15.738954 


scatter = Scatter(df, x='tsneX', y='tsneY',
                  color='type', marker='type',
                  title='t-sne',
                  legend=True)

Edit: I'm not using the tolist(), but the Scatter() of Bokeh does and produces the error below.

Jab
  • 821
  • 3
  • 13
  • 26
  • Could you post a sample of your data, for example: `print(new_data.head(5))` and explain what is your `X` and `Y` columns? – MaxU - stand with Ukraine Feb 18 '17 at 14:03
  • It has over 50.000 columns so thats not very easy to do. However I got it to work by doing df = df.loc[:, ('type', 'tsneX', 'tsneY')]. I don't know how this solves the problem but it works.. – Jab Feb 18 '17 at 14:16
  • 1
    It's not clear - what is the problem with Bokeh? – MaxU - stand with Ukraine Feb 18 '17 at 14:27
  • AttributeError: 'DataFrame' object has no attribute 'tolist' Is a result of Scatter(), so I think it's a Bokeh problem. However, by removing redundant columns it works – Jab Feb 19 '17 at 09:37

2 Answers2

33

You are using tolist incorrectly. You want: .values followed by tolist()

  type   tsneX      tsneY  
0  A      53.828863  20.740931  
1  B      57.816909  18.478468  
2  A      55.913429  22.948167  
3  C      56.603005  15.738954 

For the above dataframe, to get your X and Y values as a list you can do:

tsneY_data = df['tsneY'].values.tolist()
>> [20.740931, 18.478468, 22.948167, 15.7389541]

tsneX_data = df['tsneX'].values.tolist()
>> [53.828863, 57.816909, 55.913429, 56.603005]

As you have tried to set this to the column of a new dataframe, you can do:

new_data = pd.DataFrame()
new_data['tsneY'] = df['tsneY'].values.tolist()

> new_data
       tsneY
0  20.740931
1  18.478468
2  22.948167
3  15.738954
Chuck
  • 3,664
  • 7
  • 42
  • 76
  • Thanks @Chuck can you explain why? I read every where else that df['A'].tolist() would work and we do not need `.values`, so I'm confused. – Nicholas Humphrey Jan 08 '19 at 06:20
  • 3
    @NicholasHumphrey `tolist()` acts only on a numpy array. `df['A']` is a pandas series object, so you must first convert this to a numpy object by using `.values`. – Chuck Jan 08 '19 at 13:44
  • Thanks a lot! Never thought about that. – Nicholas Humphrey Jan 08 '19 at 13:49
  • 2
    df['A'].tolist() does work on a series, not just numpy, in current pandas. Given the solution @Jab documents below, his problem might have been 2 columns with the same name in the original df. That also produces the same .tolist() error and is not solved by adding .values – AvadData May 10 '21 at 15:07
  • @AvadData please put your comment as an answer, it is applicable in most of the cases, thanks – Akhil Saraswat Sep 27 '21 at 10:25
  • if i do this in combination with ```iloc``` i get each value as a list (with one element) in a list. I don't understand why. My code is ```df.iloc[:,1:2].values.tolist()``` . is it really the iloc? or what did i miss? – Veritas_in_Numeris Feb 10 '22 at 15:23
0

I solved the problem by first extracting the relevant columns from the dataframe.

df = df.loc[:, ('type', 'tsneX', 'tsneY')

scatter = Scatter(df, x='tsneX', y='tsneY',
                  color='type', marker='type',
                  title='t-sne',
                  legend=True)
Jab
  • 821
  • 3
  • 13
  • 26