2

I am trying to create a DataFrame object for my spam classifier.It's supposed to contain two columns: 'messages' and 'class'. However when I use the dataframe.append function to add emails as 'messages' to my dataframe along with the folder name as 'class', I'm getting this error:

AttributeError: 'DataFrame' object has no attribute 'append'

For this I initially created a Dataframe as follow data = DataFrame({'message': [], 'class': []})

I tried to use the DataFrame.append() function for adding the spam and ham emails to the DataFrame. Here's the code I am using:

data = DataFrame({'message': [], 'class': []})

data = data.append(dataFrameFromDirectory('D:\email_classifier\spam', 'spam'))
data = data.append(dataFrameFromDirectory('D:\email_classifier\ham', 'ham'))

In theory, this should add the emails and the folder name to data. Is there a way to get around this without having to use an older version of pandas?

cs95
  • 379,657
  • 97
  • 704
  • 746
Zorro
  • 21
  • 1
  • 2
  • use pd.concat instead – cs95 Apr 15 '23 at 06:12
  • Actually, I see this question would be worth reopening and adding some information on exactly why the error occurs since presumably people will paste this attribute error into google and hit search and land up here. – cs95 Apr 15 '23 at 06:25
  • See also: [Create a Pandas Dataframe by appending one row at a time](https://stackoverflow.com/questions/10715965) – Karl Knechtel Apr 15 '23 at 07:39

1 Answers1

2

pandas >= 2.0: append has been removed, use pd.concat

DataFrame.append was deprecated in version 1.4 and removed from the pandas API entirely in version 2.0

See the docs on Deprecations as well as this github issue that originally proposed its deprecation.

The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.

In the absence of append, if your data is growing rowwise, the right approach is to accumulate it in a list of records (or list of DataFrames) and convert it to one big DataFrame at the end.

accumulator = []
for args in arg_list:
    accumulator.append(dataFrameFromDirectory(*args))

big_df = pd.concat(accumulator)

References:

cs95
  • 379,657
  • 97
  • 704
  • 746