How to assign columns while ignoring index alignment

Question

Say I have two dataframes x and y in Pandas, I would like to fill in a column in x with the result of sorting a column in y. I tried this:

x['foo']  = y['bar'].order(ascending=False)

but it didn't work, I suspect because Pandas aligns indices between x and y (which have the same set of indices) during the assignment

How can I have Pandas fill in the x['foo'] with another column from another dataframe ignoring the alignment of indices?

score 11 · Answer 1 · answered Apr 12 '13 at 19:44

11

The simplest way I can think of to get pandas to ignore the indices is to give it something without indices to ignore. Starting from

>>> x = pd.DataFrame({"foo": [10,20,30]},index=[1,2,0])
>>> y = pd.DataFrame({"bar": [33,11,22]},index=[0,1,2])
>>> x
   foo
1   10
2   20
0   30
>>> y
   bar
0   33
1   11
2   22

We have the usual aligned approach:

>>> x["foo"] = y["bar"].order(ascending=False)
>>> x
   foo
1   11
2   22
0   33

Or an unaligned one, by setting x["foo"] to a list:

>>> x["foo"] = y["bar"].order(ascending=False).tolist()
>>> x
   foo
1   33
2   22
0   11

answered Apr 12 '13 at 19:44

DSM

342,061
65
592
494

2

I see, thanks. Would `my_series.tolist()` be better/worse than `np.array(my_series)` or `my_series.values()` ? – Amelio Vazquez-Reina Apr 12 '13 at 20:17
5

Except in some boundary cases I don't know if it would make much difference. Come to think of it, `my_series.values` might be better at that, as it shouldn't have to go via a Python list, and so could be faster. [Checks.. yeah, at least sometimes it's faster.] – DSM Apr 12 '13 at 20:24
What's the difference between `to_list` and `tolist`? I suspect they're the same, but how can I make sure? – rishai Sep 29 '19 at 08:09
@AmelioVazquez-Reina Note that it's just `.values` without parentheses – BallpointBen Mar 28 '20 at 15:14

Iqigai · Answer 2 · 2022-02-09T09:02:59.747

I tried the code but it seems that the order() method has been deprecated, which is no surprise since the initial question is quite old. So now we are left with sort_values() to achieve the same result. On top of that there's a refinement which consist in using to_numpy() since it is slightly faster and could be useful in case of big DataFrames (.values is even faster but it is recommended to use to_numpy() whenever working for production as explained here: https://stackoverflow.com/a/54324513/4909087)

>>> x["foo"] = y["bar"].sort_values(ascending=False)
>>> x
   foo
1   33
2   22
0   11

>>> %timeit x["foo"] = y["bar"].sort_values(ascending=False).to_list()
165 µs ± 965 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit x["foo"] = y["bar"].sort_values(ascending=False).to_numpy()
136 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit x["foo"] = y["bar"].sort_values(ascending=False).values
129 µs ± 826 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

PJ_ · Answer 3 · 2022-08-05T13:48:49.130

1

Simplest:

x['foo']  = y['bar'].sort_values(ascending=False).values

Adding .values at the end allows you to ignore indices

edited Aug 05 '22 at 13:48

answered May 19 '22 at 14:58

PJ_

473
5
9

How to assign columns while ignoring index alignment

3 Answers3

Linked