58

I have a problem with appending of dataframe. I try to execute this code

df_all = pd.read_csv('data.csv', error_bad_lines=False, chunksize=1000000)
urls = pd.read_excel('url_june.xlsx')
substr = urls.url.values.tolist()
df_res = pd.DataFrame()
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        df_res.append(res)

And when I try to save df_res I get empty dataframe. df_all looks like

ID,"url","used_at","active_seconds"
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:25,1
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:31,30
f85ce4b2f8787d48edc8612b2ccaca83,"4pda.ru/forum/index.php?showtopic=634566&view=getnewpost",2015-10-01 00:01:49,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"shop.mts.ru/smartfony/mts/smartfon-smart-sprint-4g-sim-lock-white.html?utm_source=admitad&utm_medium=cpa&utm_content=300&utm_campaign=gde_cpa&uid=3",2015-10-01 00:03:19,34
078d388438ebf1d4142808f58fb66c87,"market.yandex.ru/product/12675734/spec?hid=91491&track=char",2015-10-01 00:03:48,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"avito.ru/yoshkar-ola/telefony/mts",2015-10-01 00:04:21,4
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:25,1
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:26,9

and urls looks like

url
shoppingcart.aliexpress.com/order/confirm_order
ozon.ru/?context=order_done&number=
lk.wildberries.ru/basket/orderconfirmed
lamoda.ru/checkout/onepage/success/quick
mvideo.ru/confirmation?_requestid=
eldorado.ru/personal/order.php?step=confirm

When I print res in a loop it doesn't empty. But when I try print in a loop df_res after append, it return empty dataframe. I can't find my error. How can I fix it?

cs95
  • 379,657
  • 97
  • 704
  • 746
Petr Petrov
  • 4,090
  • 10
  • 31
  • 68
  • 1
    To new users coming to this post after getting a "Why am I getting "AttributeError: 'DataFrame' object has no attribute 'append'?": `append` has been removed from the API from pandas >= 2.0 in order to discourage iteratively appending DataFrames inside a loop. The idiomatic way in 2023 to append dataframes is to first collate your data into a python list and then call pd.concat. [more info](https://stackoverflow.com/a/76020741/4909087) – cs95 Apr 20 '23 at 07:58

3 Answers3

86

If you look at the documentation for pd.DataFrame.append

Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.

(emphasis mine).

Try

df_res = df_res.append(res)

Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:

all_res = []
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        all_res.append(res)

df_res = pd.concat(all_res)

This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.

Venkatesh Dharavath
  • 500
  • 1
  • 5
  • 18
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • 2
    thank you for explanation. Sometimes `df_res.append(res)` works, but sometimes only `df_res = df_res.append(res)` works. But I don't know why does it happen – Petr Petrov Oct 02 '16 at 09:59
  • @PetrPetrov Are you working in an interactive environment? – Ami Tavory Oct 02 '16 at 10:00
  • 2
    +1 for pointing out the inefficiency of using this to concatenate several dataframes in a loop. I keep finding that in code and it drives me crazy. – josemz Oct 17 '20 at 02:15
  • append has been deprecated since version 1.5 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html – AugustusCaesar Oct 08 '22 at 23:59
8

Why am I getting "AttributeError: 'DataFrame' object has no attribute 'append'?

pandas >= 2.0 append has been removed, use pd.concat instead1

Starting from pandas 2.0, append has been removed from the API. It was previously deprecated in version 1.4. See the docs on Deprecations as well as this github issue that originally proposed its deprecation.

The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.

1. This assume you're appending one DataFrame to another. If you're appending a row to a DataFrame, the solution is slightly different - see below.


The idiomatic way to append DataFrames is to collect all your smaller DataFrames into a list, and then make one single call to pd.concat. Here's a(n oversimplified) example

df_list = []
for df in some_function_that_yields_dfs():
    df_list.append(df)

final_df = pd.concat(df_list)

Note that if you are trying to append one row at a time rather than one DataFrame at a time, the solution is even simpler.

data = []
for a, b, c from some_function_that_yields_data():
    data.append([a, b, c])

df = pd.DataFrame(data, columns=['a', 'b', 'c'])

More information in Creating an empty Pandas DataFrame, and then filling it?

cs95
  • 379,657
  • 97
  • 704
  • 746
5

If we want append based on index:

df_res = pd.DataFrame(data = None, columns= df.columns)

all_res = []

d1 = df.ix[index-10:index-1,]     #it will take 10 rows before i-th index

all_res.append(d1)

df_res = pd.concat(all_res)
adiga
  • 34,372
  • 9
  • 61
  • 83
Siddharth Raj
  • 121
  • 2
  • 3