2

Is there a clean way to filter Pandas Series using a custom function that takes as input both index and value?

Here is a piece of code that achieves what I want to do:

series = pd.Series({"id5":88, "id3":40})
def custom(k,v):
    if k=="id5":
        return v>20
    else:
        return v>50

filtered_indexes = []
filtered_values = []
for k,v in series.iteritems():
    if custom(k,v):
        filtered_indexes.append(k)
        filtered_values.append(v)
filtered_series = pd.Series(data=filtered_values, index=filtered_indexes)

My question is: can the same be achieved cleaner and/or more efficiently with syntax like

series.filter(lambda x: custom(x.index, x.value))
jpp
  • 159,742
  • 34
  • 281
  • 339
Atte Juvonen
  • 4,922
  • 7
  • 46
  • 89

2 Answers2

1

There is problem Series.apply have no accesses to index and DataFrame.filter is not implemented for Series.

It is possible, but need create DataFrame:

s = series[series.to_frame().apply(lambda x: custom(x.name, x), axis=1).squeeze()]
print (s)
id5    88
dtype: int64

Or use groupby with filtration:

s = series.groupby(level=0).filter(lambda x: custom(x.name, x)[0])
print (s)
id5    88
dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can vectorise your logic as below. This avoids inefficient lambda loops and may also make your code cleaner.

res = series[((series.index == 'id5') & (series > 20)) |
             ((series.index != 'id5') & (series > 50))]

Result:

id5    88
dtype: int64

For readability, you may wish to separate the Boolean criteria:

c1 = ((series.index == 'id5') & (series > 20))
c2 = ((series.index != 'id5') & (series > 50))

res = series[c1 | c2]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks, but this is missing the custom function and the sample I provided is just a "toy example". The actual function is more complicated and can't be expressed this way. – Atte Juvonen Mar 23 '18 at 15:25
  • So link `c1 = f(series.index, series)` where `f` is your custom function? Just make sure `f` returns an array. The general point is the same, and this solution is adaptable to any function. – jpp Mar 23 '18 at 15:27
  • I'm not sure what you mean. Especially wrt to "make sure `f` returns an array". That sounds like `f` would simply contain all the ugly code I was trying to get rid off in the first place. Am I misunderstanding something? – Atte Juvonen Mar 23 '18 at 15:33
  • Yes. It's good practice to have each function do something different. You haven't explained what `f(index, series)` involves, so I can't comment further. But if it's a numeric computation it's most likely you can optimize via `numpy` / `numba` / other tools. Of course, you can always take @jezrael's approach and convert your series to a dataframe.. – jpp Mar 23 '18 at 15:35