In my case I had a pandas Series where the values are tuples of characters:
Out[67]
0 (H, H, H, H)
1 (H, H, H, T)
2 (H, H, T, H)
3 (H, H, T, T)
4 (H, T, H, H)
Therefore I could use indexing to filter the series, but to create the index I needed apply
. My condition is "find all tuples which have exactly one 'H'".
series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]
I admit it is not "chainable", (i.e. notice I repeat series_of_tuples
twice; you must store any temporary series into a variable so you can call apply(...) on it).
There may also be other methods (besides .apply(...)
) which can operate elementwise to produce a Boolean index.
Many other answers (including accepted answer) using the chainable functions like:
.compress()
.where()
.loc[]
[]
These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!
Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]
:
series_of_tuples.loc[lambda x: x.count('H')==1]
Produces the error:
KeyError: 'Level H must be same as name (None)'
I was very confused, but it seems to be using the Series.count series_of_tuples.count(...)
function , which is not what I wanted.
I admit that an alternative data structure may be better:
- A Category datatype?
- A Dataframe (each element of the tuple becomes a column)
- A Series of strings (just concatenate the tuples together):
This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)
series_of_tuples.apply(''.join)
So I can then use the chainable Series.str.count
series_of_tuples.apply(''.join).str.count('H')==1