AttributeError: 'DataFrame' object has no attribute 'tolist'

Question

When I run this code in Jupyter Notebook:

columns = ['nkill', 'nkillus', 'nkillter','nwound', 'nwoundus', 'nwoundte', 'propvalue', 'nperps', 'nperpcap', 'iyear', 'imonth', 'iday']

for col in columns:
    # needed for any missing values set to '-99'
    df[col] = [np.nan if (x < 0) else x for x in 
df[col].tolist()]
    # calculate the mean of the column
    column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
    mean = round(np.mean(column_temp))
    # then apply the mean to all NaNs
    df[col].fillna(mean, inplace=True)

I receive the following error:

AttributeError                            Traceback 
(most recent call last)
<ipython-input-56-f8a0a0f314e6> in <module>()
  3 for col in columns:
  4     # needed for any missing values set to '-99'
----> 5     df[col] = [np.nan if (x < 0) else x for x in df[col].tolist()]
  6     # calculate the mean of the column
  7     column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]

/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4375                 return self[name]
-> 4376             return object.__getattribute__(self, name)
   4377 
   4378     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'tolist'

The code works fine when I run it in Pycharm, and all of my research has led me to conclude that it should be fine. Am I missing something?

I've created a Minimal, Complete, and Verifiable example below:

import numpy as np
import pandas as pd
import os
import math

# get the path to the current working directory
cwd = os.getcwd()

# then add the name of the Excel file, including its extension to get its relative path
# Note: make sure the Excel file is stored inside the cwd
file_path = cwd + "/data.xlsx"

# Copy the database to file
df = pd.read_excel(file_path)

columns = ['nkill', 'nkillus', 'nkillter', 'nwound', 'nwoundus', 'nwoundte', 'propvalue', 'nperps', 'nperpcap', 'iyear', 'imonth', 'iday']

for col in columns:
    # needed for any missing values set to '-99'
    df[col] = [np.nan if (x < 0) else x for x in df[col].tolist()]
    # calculate the mean of the column
    column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
    mean = round(np.mean(column_temp))
    # then apply the mean to all NaNs
    df[col].fillna(mean, inplace=True)

Does this work? `df[col].tolist()` => `df[col].values.tolist()` — JacobIRR, Dec 03 '18 at 18:27
No sorry, it throws up a different error: TypeError: '<' not supported between instances of 'list' and 'int' — Uncle_Timothy, Dec 03 '18 at 18:29
Undoing the dupe-close because `df[col]` should be a Series, not a DataFrame. The correct fix should involve figuring out why it's not a Series, not using the code for DataFrames. — user2357112, Dec 03 '18 at 18:30
That said, we're going to need to see more than what you've shown us to reproduce this error. Perhaps `df` isn't what you think it is. — user2357112, Dec 03 '18 at 18:33
Often attribute errors like this are the result of the object being the wrong type. The first comment was based on the idea that numpy arrays have a `.tolist` method, but `DataFrame` does not. `Series` has this method (which uses the numpy `tolist` on its values). — hpaulj, Dec 03 '18 at 18:36
@user2357112 What can I show you? I parsed in an Excel file and stored it in df: file_path = cwd + "/globalterrorismdb_0718dist.xlsx" # Copy the database to file df = pd.read_excel(file_path) — Uncle_Timothy, Dec 03 '18 at 18:38
@Uncle_Timothy, See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Mock up a **[mcve]**, i.e. a **minimal** and **reproducible** example of the problem you observe. — jpp, Dec 03 '18 at 18:39
I would start with a `print(type(df))` and `print(type(df['nkill']))`, to verify that the objects are (are not) dataframe and series. — hpaulj, Dec 03 '18 at 18:41
@jpp I've included a Minimal, Complete, and Verifiable example in my post — Uncle_Timothy, Dec 03 '18 at 18:49
Without the `xlsx` file we can't copy-n-paste and run your code. It isn't Verifiable. — hpaulj, Dec 03 '18 at 19:41
If you're trying to replace values in a dataframe, there may be better approaches than a list comprehension... what happens if you get rid of the `.tolist()`? — Evan, Dec 03 '18 at 20:07

jpp · Answer 1 · 2018-12-03T21:24:51.180

You have an XY Problem. You've described what you are trying to achieve in your comments, but your approach is not appropriate for Pandas.

Avoid `for` loops and `list`

With Pandas, you should look to avoid explicit for loops or conversion to Python list. Pandas builds on NumPy arrays which support vectorised column-wise operations.

So let's look at how you can rewrite:

for col in columns:
    # values less than 0 set to NaN
    # calculate the mean of the column with 0 for NaN
    # then apply the mean to all NaNs

You can now use Pandas methods to achieve the above.

`apply` + `pd.to_numeric` + `mask` + `fillna`

You can define a function mean_update and use pd.DataFrame.apply to apply it to each series:

df = pd.DataFrame({'A': [1, -2, 3, np.nan],
                   'B': ['hello', 4, 5, np.nan],
                   'C': [-1.5, 3, np.nan, np.nan]})

def mean_update(s):
    s_num = pd.to_numeric(s, errors='coerce')  # convert to numeric
    s_num = s_num.mask(s_num < 0)              # replace values less than 0 with NaN
    s_mean = s_num.fillna(0).mean()            # calculate mean
    return s_num.fillna(s_mean)                # replace NaN with mean

df = df.apply(mean_update)                     # apply to each series

print(df)

     A     B     C
0  1.0  2.25  0.75
1  1.0  4.00  3.00
2  3.0  5.00  0.75
3  1.0  2.25  0.75

AttributeError: 'DataFrame' object has no attribute 'tolist'

1 Answers1

Avoid for loops and list

apply + pd.to_numeric + mask + fillna

Avoid `for` loops and `list`

`apply` + `pd.to_numeric` + `mask` + `fillna`