0

I have a pandas dataframe with one column that contains an empty list in each cell.

I need to duplicate the dataframe, and append it at the bottom of the original dataframe, but with additional information in the list.

Here is a minimal code example:

df_main = pd.DataFrame([['a', []], ['b', []]], columns=['letter', 'mylist'])
> df_main 
letter  mylist
0   a   []
1   b   []

df_copy = df_main.copy()
for index, row in df_copy.iterrows():
    row.mylist = row.mylist.append(1)

pd.concat([ df_copy,df_main], ignore_index=True)

> result:
letter  mylist
0   a   None
1   b   None
2   a   [1]
3   b   [1]

As you can see there is a problem that the [] empty list was replaced by a None

Just to make sure, this is what I would like to have:

letter  mylist
0   a   []
1   b   []
2   a   [1]
3   b   [1]

How can I achieve that?

justadev
  • 1,168
  • 1
  • 17
  • 32

3 Answers3

1

append method on list return a None value, that's why None appears in the final dataframe. You may have use + operator for reassignment like this:

import pandas as pd
df_main = pd.DataFrame([['a', []], ['b', []]], columns=['letter', 'mylist'])

df_copy = df_main.copy()
for index, row in df_copy.iterrows():
    row.mylist = row.mylist + list([1])

pd.concat([df_main, df_copy], ignore_index=True).head()

Output of this block of code:

letter  mylist
0   a   []
1   b   []
2   a   [1]
3   b   [1]
Anup Tiwari
  • 474
  • 2
  • 5
0

A workaround to solve your problem would be to create a temporary column mylist2 with np.empty((len(df), 0)).tolist()) and use np.where() to change the None values of mylist to an empty list and then drop the empty column.

import pandas as pd, numpy as np
df_main = pd.DataFrame([['a', []], ['b', []]], columns=['letter', 'mylist'])
df_copy = df_main.copy()
for index, row in df_copy.iterrows():
    row.mylist = row.mylist.append(1)

df = (pd.concat([df_copy,df_main], ignore_index=True)
    .assign(mylist2=np.empty((len(df), 0)).tolist()))
df['mylist'] = np.where((df['mylist'].isnull()), df['mylist2'], df['mylist'])
df= df.drop('mylist2', axis=1)
df
Out[1]: 
  letter mylist
0      a     []
1      b     []
2      a    [1]
3      b    [1]
David Erickson
  • 16,433
  • 2
  • 19
  • 35
0

Not only does append method on list return a None value as indicated in the first answer, but both df_main and df_copy contain pointers to the same lists. So after:

for index, row in df_copy.iterrows():
    row.mylist.append(1)

both dataframes have updated lists with one element. For your code to work as expected you can create a new list after you copy the dataframe:

df_copy = df_main.copy()
for index, row in df_copy.iterrows():
    row.mylist = []

This question is another great example why we should not put objects in a dataframe.

Robert Altena
  • 787
  • 2
  • 11
  • 26