filter on pandas array

Question

I'm doing this kind of code to find if a value belongs to the array a inside a dataframe:

Solution 1

df = pd.DataFrame([{'a':[1,2,3], 'b':4},{'a':[5,6], 'b':7},])
df = df.explode('a')
df[df['a'] == 1]

will give the output:

    a   b
0   1   4

Problem

This can go worst if there are repetitions:

df = pd.DataFrame([{'a':[1,2,1,3], 'b':4},{'a':[5,6], 'b':7},])
df = df.explode('a')
df[df['a'] == 1]

will give the output:

    a   b
0   1   4
0   1   4

Solution 2

Another solution could go like:

df = pd.DataFrame([{'a':[1,2,1,3], 'b':4},{'a':[5,6], 'b':7},])
df = df[df['a'].map(lambda row: 1 in row)]

Problem

That Lambda can't go fast if the Dataframe is Big.

Question

As a first goal, I want all the lines where the value 1 belongs to a:

without using Python, since it is slow
with high performance
avoiding memory issues
...

So I'm trying to understand what may I do with the arrays inside Pandas. Is there some documentation on how to use this type efficiently?

1

What are you expecting? – piRSquared Jan 13 '20 at 20:13

score 0 · Answer 1 · answered Jan 13 '20 at 20:15

0

IIUC, you are trying to do:

df[df['a'].eq(1).groupby(level=0).transform('any')

Output:

answered Jan 13 '20 at 20:15

Quang Hoang

146,074
10
56
74

score 0 · Answer 2 · answered Jan 13 '20 at 20:16

0

Nothing is wrong. This is normal behavior of pandas.explode().

To check whether a value belongs to values in a you may use this:

if x in df.a.explode()

where x is what you test for.

answered Jan 13 '20 at 20:16

Poe Dator

4,535
2
14
35

score 0 · Answer 3 · answered Jan 14 '20 at 07:05

I think you can convert arrays to scalars with DataFrame constructor and then test value with DataFrame.eq and DataFrame.any:

df = df[pd.DataFrame(df['a'].tolist()).eq(1).any(axis=1)]
print (df)
              a  b
0  [1, 2, 1, 3]  4

Details:

print (pd.DataFrame(df['a'].tolist()))
   0  1    2    3
0  1  2  1.0  3.0
1  5  6  NaN  NaN

print (pd.DataFrame(df['a'].tolist()).eq(1))
       0      1      2      3
0   True  False   True  False
1  False  False  False  False

So I'm trying to understand what may I do with the arrays inside Pandas. Is there some documentation on how to use this type efficiently?

I think working with lists in pandas is not good idea.

filter on pandas array

Solution 1

Problem

Solution 2

Problem

Question

3 Answers3