1

I'm doing this kind of code to find if a value belongs to the array a inside a dataframe:

Solution 1

df = pd.DataFrame([{'a':[1,2,3], 'b':4},{'a':[5,6], 'b':7},])
df = df.explode('a')
df[df['a'] == 1]

will give the output:

    a   b
0   1   4

Problem

This can go worst if there are repetitions:

df = pd.DataFrame([{'a':[1,2,1,3], 'b':4},{'a':[5,6], 'b':7},])
df = df.explode('a')
df[df['a'] == 1]

will give the output:

    a   b
0   1   4
0   1   4

Solution 2

Another solution could go like:

df = pd.DataFrame([{'a':[1,2,1,3], 'b':4},{'a':[5,6], 'b':7},])
df = df[df['a'].map(lambda row: 1 in row)]

Problem

That Lambda can't go fast if the Dataframe is Big.

Question

As a first goal, I want all the lines where the value 1 belongs to a:

  • without using Python, since it is slow
  • with high performance
  • avoiding memory issues
  • ...

So I'm trying to understand what may I do with the arrays inside Pandas. Is there some documentation on how to use this type efficiently?

oleber
  • 1,089
  • 4
  • 12
  • 25

3 Answers3

0

IIUC, you are trying to do:

df[df['a'].eq(1).groupby(level=0).transform('any')

Output:

   a  b
0  1  4
0  2  4
0  3  4
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Nothing is wrong. This is normal behavior of pandas.explode().

To check whether a value belongs to values in a you may use this:

if x in df.a.explode()

where x is what you test for.

Poe Dator
  • 4,535
  • 2
  • 14
  • 35
0

I think you can convert arrays to scalars with DataFrame constructor and then test value with DataFrame.eq and DataFrame.any:

df = df[pd.DataFrame(df['a'].tolist()).eq(1).any(axis=1)]
print (df)
              a  b
0  [1, 2, 1, 3]  4

Details:

print (pd.DataFrame(df['a'].tolist()))
   0  1    2    3
0  1  2  1.0  3.0
1  5  6  NaN  NaN

print (pd.DataFrame(df['a'].tolist()).eq(1))
       0      1      2      3
0   True  False   True  False
1  False  False  False  False

So I'm trying to understand what may I do with the arrays inside Pandas. Is there some documentation on how to use this type efficiently?

I think working with lists in pandas is not good idea.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252