Why does pandas assert lists and tuples equal?

Asked Oct 25 '18 at 20:43

Active Oct 26 '18 at 10:04

Viewed 429 times

Why does pandas pass the assert_series_equal when comparing a series of lists and a series of tuples?

Example, this test passes:

l = pd.Series([[1], [2], [3]])
t = pd.Series([(1,), (2,), (3,)])
pd.testing.assert_series_equal(l, t)

I find this especially worrisome since you can't aggregate a multi-indexed dataframe for a series of lists if the first result in the aggregator returns a list for the first group. However, this does work for a tuples.

Example:

>>> df = pd.DataFrame([[0, 0, 0], [1, 1, 2], [[1], [2], [3]], [(1,), (2,), (3,)]]).T
>>> df
   0  1    2     3
0  0  1  [1]  (1,)
1  0  1  [2]  (2,)
2  0  2  [3]  (3,)

>>> df.groupby([0, 1])[2].agg(sum)
ValueError: Function does not reduce

>>> df.groupby([0, 1])[3].agg(sum)
0  1
0  1    (1, 2)
   2      (3,)

See this answer for more detail

edited Oct 26 '18 at 10:04

asked Oct 25 '18 at 20:43

Jurgy

2,128
1
20
33

1

You should avoid using `list` or `tuple` objects inside data-frames altogether. – juanpa.arrivillaga Oct 25 '18 at 20:48
This is very strange though. – juanpa.arrivillaga Oct 25 '18 at 20:49
1

It's a weakness of the test itself from what I can tell. I would use `pd.testing.assert_series_equal(l, t, check_exact=True)` – user3483203 Oct 25 '18 at 20:50
@miradulo dang, that seems like something that should just fail rather than pass silently... – juanpa.arrivillaga Oct 25 '18 at 20:54
@juanpa.arrivillaga Yeah it's kinda weird. If I'm not getting too lost in the source code, it looks like `Series.equals` ends up delegating to `np.array_equal` in the end, while `pd.testing.assert_series_equal` ends up using `pd._libs.testing.assert_almost_equal`, which returns `True`. – miradulo Oct 25 '18 at 21:02

Why does pandas assert lists and tuples equal?

0 Answers0