Add column in dataframe from list

Question

I have a dataframe with some columns like this:

The possible range of values in A are only from 0 to 7.

Also, I have a list of 8 elements like this:

List=[2,5,6,8,12,16,26,32]  //There are only 8 elements in this list

If the element in column A is n, I need to insert the n th element from the List in a new column, say 'D'.

How can I do this in one go without looping over the whole dataframe?

The resulting dataframe would look like this:

A   B   C   D
0           2
4           12
5           16
6           26
7           32
7           32
6           26
5           16

Note: The dataframe is huge and iteration is the last option option. But I can also arrange the elements in 'List' in any other data structure like dict if necessary.

I think you needs a (smaller) toy example, with the desired result. It sounds a little vague atm. — Andy Hayden, Oct 31 '14 at 03:12

sparrow · Answer 1 · 2019-04-11T19:53:55.547

424

Just assign the list directly:

df['new_col'] = mylist

Alternative
Convert the list to a series or array and then assign:

se = pd.Series(mylist)
df['new_col'] = se.values

or

df['new_col'] = np.array(mylist)

edited Apr 11 '19 at 19:53

answered Jul 20 '16 at 20:58

sparrow

10,794
12
54
74

6

`pykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """Entry point for launching an IPython kernel.` – franchb Feb 01 '18 at 15:54
@sparrow will using `pd.Series` effect the dtype? I mean will it leave floats as floats and strings as strings? Or will the elements within the list default to strings? – 3kstc Feb 27 '18 at 03:23
2

@IlyaRusin, it's a false positive which can be ignored in this case. For more info: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – sparrow Aug 14 '18 at 21:47
2

This can be simplified to: df['new_col'] = pd.Series(mylist).values – smartse Nov 05 '18 at 19:00

score 61 · Accepted Answer · edited Sep 26 '18 at 13:36

61

IIUC, if you make your (unfortunately named) List into an ndarray, you can simply index into it naturally.

>>> import numpy as np
>>> m = np.arange(16)*10
>>> m[df.A]
array([  0,  40,  50,  60, 150, 150, 140, 130])
>>> df["D"] = m[df.A]
>>> df
    A   B   C    D
0   0 NaN NaN    0
1   4 NaN NaN   40
2   5 NaN NaN   50
3   6 NaN NaN   60
4  15 NaN NaN  150
5  15 NaN NaN  150
6  14 NaN NaN  140
7  13 NaN NaN  130

Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.A will pick out the appropriate elements of m.

Note that if you're using an old version of numpy, you might have to use m[df.A.values] instead-- in the past, numpy didn't play well with others, and some refactoring in pandas caused some headaches. Things have improved now.

edited Sep 26 '18 at 13:36

edge-case

1,128
2
14
32

answered Oct 31 '14 at 03:18

DSM

342,061
65
592
494

Hi @DSM. I get what you are saying but I am getting this error: `Traceback (most recent call last):` `File "./b.py", line 24, in ` `d["D"] = m[d.A]` `IndexError: unsupported iterator index` – mane Oct 31 '14 at 03:44
1

@mane: urf, that's an old `numpy` bug. Does `d["D"] = m[d.A.values]` work for you? – DSM Oct 31 '14 at 03:51

score 21 · Answer 3 · edited Jul 10 '18 at 14:21

21

A solution improving on the great one from @sparrow.

Let df, be your dataset, and mylist the list with the values you want to add to the dataframe.

Let's suppose you want to call your new column simply, new_column

First make the list into a Series:

column_values = pd.Series(mylist)

Then use the insert function to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

df.insert(loc=0, column='new_column', value=column_values)

edited Jul 10 '18 at 14:21

erip

16,374
11
66
121

answered Dec 07 '17 at 11:39

Salvatore Cosentino

6,663
6
17
25

This will not work if you changed your indexes of df to something other then 1,2,3... in that case you have to add between the lines: column_values.index=df.index – Guy s Mar 16 '19 at 17:47

score 9 · Answer 4 · answered Oct 17 '19 at 11:52

Old question; but I always try to use fastest code!

I had a huge list with 69 millions of uint64. np.array() was fastest for me.

df['hashes'] = hashes
Time spent: 17.034842014312744

df['hashes'] = pd.Series(hashes).values
Time spent: 17.141014337539673

df['key'] = np.array(hashes)
Time spent: 10.724546194076538

score 8 · Answer 5 · edited Oct 31 '14 at 04:04

8

First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})

And the mapping that you desire:

mapping = dict(enumerate([2,5,6,8,12,16,26,32]))

df['D'] = df['A'].map(mapping)

Done!

print df

Output:

edited Oct 31 '14 at 04:04

Toby Seo

457
1
4
14

answered Oct 31 '14 at 03:36

Phil Cooper

5,747
1
25
41

1

I think the OP knows how to do this already. By my reading the issue is constructing `D` from the elements of `A` and `List` ("If the element in column A is n, I need to insert the n th element from the List in a new column, say 'D'.") – DSM Oct 31 '14 at 03:39
SO has turned into some kind of F(*& nanny state. Thanks to @DSM for the comment but I couldn't correct the post untill it was peer reviewed. and then it was rejected because it was too fast. and then I was able to peer review my own edit. and then it's too late because a worse (IMHO) answer was "accepted". SO is really got some meta-nanny's who are less than helpful!!!! – Phil Cooper Oct 31 '14 at 04:01
Well, I can't speak for the nannies, but you'll find that your approach is about an order of magnitude slower on long arrays. In other respects, of course, choosing between `np.array(List)[df.A]` and `df["A"].map(dict(enumerate(List)))` is mostly a matter of preference. – DSM Oct 31 '14 at 04:11
Hi Phil, I only saw your solution and DSM's comment and then never got back to it since DSM's solution worked fine for me. But now looking at your solution, it works too. I have run DSM's solution on my dataset of about 200k entries and it runs in a couple of seconds with all the other calculations that I have. I am totally new to python-pandas and personally was not looking for anything elegant or great; whatever worked was fine. But honestly, thanks for the solution. – mane Oct 31 '14 at 05:31

score 6 · Answer 6 · answered Jan 20 '21 at 06:42

You can also use df.assign:

In [1559]: df
Out[1559]: 
   A   B   C
0  0 NaN NaN
1  4 NaN NaN
2  5 NaN NaN
3  6 NaN NaN
4  7 NaN NaN
5  7 NaN NaN
6  6 NaN NaN
7  5 NaN NaN

In [1560]: mylist = [2,5,6,8,12,16,26,32]

In [1567]: df = df.assign(D=mylist)

In [1568]: df
Out[1568]: 
   A   B   C   D
0  0 NaN NaN   2
1  4 NaN NaN   5
2  5 NaN NaN   6
3  6 NaN NaN   8
4  7 NaN NaN  12
5  7 NaN NaN  16
6  6 NaN NaN  26
7  5 NaN NaN  32

Add column in dataframe from list

6 Answers6

Linked