4

How to add an empty column to a dataframe?

This is partially covered already.

The dtype of df["D"] = np.nan in the accepted answer is dtype=numpy.float64.

Is there a way to initialize an empty list into each cell?

Tried df["D"] = [[]] * len(df) but all values are pointing to the same object and setting one to a value sets them all.

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df

   A  B
0  1  2
1  2  3
2  3  4


df["D"] = [[]] * len(df)
df
   A  B   D
0  1  2  []
1  2  3  []
2  3  4  []


df['D'][1].append(['a','b','c','d'])
df
   A  B               D
0  1  2  [[a, b, c, d]]
1  2  3  [[a, b, c, d]]
2  3  4  [[a, b, c, d]]

wanted

   A  B               D
0  1  2  []
1  2  3  [[a, b, c, d]]
2  3  4  []
Bill Armstrong
  • 1,615
  • 3
  • 23
  • 47
Joylove
  • 414
  • 1
  • 5
  • 20

2 Answers2

6

Use

df["D"] = [[] for _ in range(len(df))]

instead of

df["D"] = [[]] * len(df) 

This way you'll create a different [] for each row.


Basically [[] for _ in range(len(df))] is a list comprehension. It creates a [] for each value in range(len(df)).

This code has the same functionality as

l = []
for _ in range(len(df)):
    l.append([])

But is notably faster, simpler to write and even more readable.

If you want to know further on list comprehensions , I'd recommend the answers for this question.

If you want to know further on why that behavior happens when doing [[]] * len(df), I'd recommend the answers for this question

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • 1
    Thanks this did work, could you explain what you did for my understanding please? – Joylove Jul 06 '18 at 03:47
  • 1
    @Joylove Sure :) I've edited – rafaelc Jul 06 '18 at 03:52
  • 2
    The underscore is just a name of a variable. Could have been `i`, `j` or any other name. It is just a convention to name the variable `_` if you are not going to use it – rafaelc Jul 06 '18 at 04:06
  • 1
    Which line raises this warning? `df["D"] = [[] for _ in range(len(df))]` will not raise this warning. – rafaelc Jul 06 '18 at 04:17
1

Could you not just pass in a list of lists when creating the column. Then assign the list value to a temporary variable, next assign that list to one field in the data frame using loc

import pandas as pd

df = pd.DataFrame()
df['col A'] = [1,12,312,352]
df['col B'] = [[],[],[],[]]

ser = [1,4,5,6]
df.loc[2,'col B'] = ser
df

Output:

Click Here to View Image

Does this help? Is this what you are looking for?

Kavi Sek
  • 202
  • 1
  • 9