184

Given a DataFrame:

np.random.seed(0)
df = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3])
df

          A         B         C
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.950088 -0.151357 -0.103219

What is the simplest way to add a new column containing a constant value eg 0?

          A         B         C  new
1  1.764052  0.400157  0.978738    0
2  2.240893  1.867558 -0.977278    0
3  0.950088 -0.151357 -0.103219    0

This is my solution, but I don't know why this puts NaN into 'new' column?

df['new'] = pd.Series([0 for x in range(len(df.index))])

          A         B         C  new
1  1.764052  0.400157  0.978738  0.0
2  2.240893  1.867558 -0.977278  0.0
3  0.950088 -0.151357 -0.103219  NaN
cs95
  • 379,657
  • 97
  • 704
  • 746
yemu
  • 26,249
  • 10
  • 32
  • 29
  • 12
    if you use an index its okay. `df['new'] = pd.Series([0 for x in range(len(df.index))], index=df.index)`. – zach Jun 04 '14 at 13:52
  • 8
    also, a list comprehension is entirely unnecessary here. just do `[0] * len(df.index)` – acushner Jun 04 '14 at 14:01
  • @joris, I meant that df['new']=0 shows the proper why of assigning zeros to the whole column, but it doesn't explain why my first attempt inserts NaN. This was answered by the Philip Cloud in the answer I accepted. – yemu Jun 04 '14 at 18:44
  • 13
    Simply do `df['new'] = 0` – flow2k May 20 '19 at 06:29
  • @flow2k it gives a warning A value is trying to be set on a copy of a slice from a DataFrame. – KansaiRobot Dec 12 '22 at 14:02

4 Answers4

191

Super simple in-place assignment: df['new'] = 0

For in-place modification, perform direct assignment. This assignment is broadcasted by pandas for each row.

df = pd.DataFrame('x', index=range(4), columns=list('ABC'))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x

df['new'] = 'y'
# Same as,
# df.loc[:, 'new'] = 'y'
df

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

Note for object columns

If you want to add an column of empty lists, here is my advice:

  • Consider not doing this. object columns are bad news in terms of performance. Rethink how your data is structured.
  • Consider storing your data in a sparse data structure. More information: sparse data structures
  • If you must store a column of lists, ensure not to copy the same reference multiple times.

    # Wrong
    df['new'] = [[]] * len(df)
    # Right
    df['new'] = [[] for _ in range(len(df))]
    

Generating a copy: df.assign(new=0)

If you need a copy instead, use DataFrame.assign:

df.assign(new='y')

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

And, if you need to assign multiple such columns with the same value, this is as simple as,

c = ['new1', 'new2', ...]
df.assign(**dict.fromkeys(c, 'y'))

   A  B  C new1 new2
0  x  x  x    y    y
1  x  x  x    y    y
2  x  x  x    y    y
3  x  x  x    y    y

Multiple column assignment

Finally, if you need to assign multiple columns with different values, you can use assign with a dictionary.

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

   A  B  C new1 new2 new3
0  x  x  x    w    y    z
1  x  x  x    w    y    z
2  x  x  x    w    y    z
3  x  x  x    w    y    z
cs95
  • 379,657
  • 97
  • 704
  • 746
56

With modern pandas you can just do:

df['new'] = 0
cs95
  • 379,657
  • 97
  • 704
  • 746
Roko Mijic
  • 6,655
  • 4
  • 29
  • 36
  • 3
    Can you point out which specific answers are out of date? Let's leave a comment under them so the authors have a chance to improve. – cs95 Apr 19 '20 at 03:15
  • I think the answer by Phillip Cloud is out of date. The answer by cs95 seems correct to me but it overcomplicates things a bit IMO. This is a simpler one-liner; at least for the question that was asked. – Roko Mijic Apr 20 '20 at 15:10
  • 1
    Fyi the only difference between this answer and cs95 (AKA, me) answer is the column name and value. All the pieces are there. – cs95 Apr 22 '20 at 21:11
  • 1
    It is not so much that they are out of date, but this answer is less verbose than the others and is easier to read. – Joey Oct 28 '20 at 04:47
  • 1
    @Joey Can't argue with that logic, I suppose this answer is more suited to people who are just looking to copy paste anything that will work, rather than looking to understand and learn more about the library. Touche. – cs95 Nov 02 '20 at 10:11
  • 1
    @cs95 yes your answer lets people learn more. Also the df['new'] = 0 highlighted in the title is good for readability. I have upvoted that too. Less verbose than df.apply(lambda x: 0, axis=1) – Joey Nov 03 '20 at 00:43
  • if you do that multiple times, you get this warning: DataFrame is highly fragmented. ... – bonobo Mar 09 '23 at 06:06
25

The reason this puts NaN into a column is because df.index and the Index of your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandas tries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaN wherever they aren't aligned. Play around with the reindex and align methods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align() works with partially aligned indices:

In [7]: from pandas import DataFrame

In [8]: from numpy.random import randint

In [9]: df = DataFrame({'a': randint(3, size=10)})

In [10]:

In [10]: df
Out[10]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [11]: s = df.a[:5]

In [12]: dfa, sa = df.align(s, axis=0)

In [13]: dfa
Out[13]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [14]: sa
Out[14]:
0     0
1     2
2     0
3     1
4     0
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: a, dtype: float64
georg
  • 635
  • 7
  • 16
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • 11
    i didnt downvote but your code lacks comments, makes it hard to follow along with that youre trying to achieve in the snippet – redress Jun 12 '17 at 20:07
  • 9
    This does not really answer the question. OP is asking about how to add a new column containing a constant value. – cs95 Feb 19 '19 at 01:35
  • 1
    I don't agree that there's just _one_ question here. There's "How do I assign a constant value to a column?" as well as "My attempt to do this doesn't work in X way, why is it behaving unexpectedly?" I believe I've addressed both points, the first by referring to another answer. Please read _all_ of the text in my answer. – Phillip Cloud Jun 17 '19 at 01:21
  • 2
    I think the problem is with the question rather than with your answer. There are two distinct questions contained in this post and as a result two distinct answers are required to answer the question. I believe this should have been flagged as being too broad and the poster should have asked two separate questions. – Kevin Oct 08 '19 at 10:01
11

Here is another one liner using lambdas (create column with constant value = 10)

df['newCol'] = df.apply(lambda x: 10, axis=1)

before

df
    A           B           C
1   1.764052    0.400157    0.978738
2   2.240893    1.867558    -0.977278
3   0.950088    -0.151357   -0.103219

after

df
        A           B           C           newCol
    1   1.764052    0.400157    0.978738    10
    2   2.240893    1.867558    -0.977278   10
    3   0.950088    -0.151357   -0.103219   10
Grant Shannon
  • 4,709
  • 1
  • 46
  • 36
  • 5
    `df['newCol'] = 10` is also a one liner (and is faster). What is the advantage of using apply here? – cs95 Aug 15 '19 at 17:49
  • 3
    not trying to compete with you here - just showing an alternative approach. – Grant Shannon Aug 15 '19 at 20:39
  • 1
    @cs95 This is helpful. I wanted to create a new column where each value was a separate empty list. Only this method works. – Yatharth Agarwal Aug 31 '19 at 08:46
  • @YatharthAgarwal I'll give you that, but it also makes sense given pandas is not designed to work well with columns of lists. – cs95 Aug 31 '19 at 08:51
  • 1
    @YatharthAgarwal If you need assign empty lists this is still a subpar solution because it uses apply. Try `df['new'] = [[] for _ in range(len(df))]` – cs95 Apr 24 '20 at 01:52
  • 2
    I like this solution more for beginners like me. The `df.apply` function can be used for a variety of problems, and this use-case makes *sense*. On the other hand, `df['newCol'] = 10` is easy to use and works "magically", **but** it make much logical sense, and is something one just needs to learn off-by-heart. – abrac Feb 22 '21 at 14:19