I'm doing this kind of code to find if a value belongs to the array a
inside a dataframe:
Solution 1
df = pd.DataFrame([{'a':[1,2,3], 'b':4},{'a':[5,6], 'b':7},])
df = df.explode('a')
df[df['a'] == 1]
will give the output:
a b
0 1 4
Problem
This can go worst if there are repetitions:
df = pd.DataFrame([{'a':[1,2,1,3], 'b':4},{'a':[5,6], 'b':7},])
df = df.explode('a')
df[df['a'] == 1]
will give the output:
a b
0 1 4
0 1 4
Solution 2
Another solution could go like:
df = pd.DataFrame([{'a':[1,2,1,3], 'b':4},{'a':[5,6], 'b':7},])
df = df[df['a'].map(lambda row: 1 in row)]
Problem
That Lambda can't go fast if the Dataframe is Big.
Question
As a first goal, I want all the lines where the value 1
belongs to a
:
- without using Python, since it is slow
- with high performance
- avoiding memory issues
- ...
So I'm trying to understand what may I do with the arrays inside Pandas. Is there some documentation on how to use this type efficiently?