I'm trying to using pandas for chaining together map and filter operations. I've come across several options, partly outlined in here: Pandas How to filter a Series
To summarize,
s = Series(range(10))
s.where(s > 4).dropna()
s.where(lambda x: x > 4).dropna()
s.loc[s > 4]
s.loc[lambda x: x > 4]
s.to_frame(name='x').query("x > 4")
This is fine for numerical comparisons and equality checks, but it doesn't work for predicates involving other operations. For a simple example, consider matching against the first character of a string.
s = Series(['aa', 'ab', 'ba'])
s.loc[lambda x: x.startswith('a')] # fails
This fails with a message like "Series has no attribute 'startswith'" since the argument x
passed to the lambda expression in the second line is the series itself, rather than the individual elements it contains.
Interestingly map
does allow element-wise access:
Series(list('abcd')).map(lambda x: x.upper())
# results in ['A', 'B', 'C', 'D'] even though Series has no upper method
While there's probably some clever ways to handle the startswith
example, I'm hoping to find a more general solution where a series can be filtered using a function that accepts individual values from the collection. And ideally it would allow chaining together operations as in,
s = (Series(...)
.map(...)
.where(...)
.map(...))
Is that supported in pandas?
UPDATE:
Scott provided the answer for cases where the value is a string, which can be handled with Series.str
as described in his answer.
But what about cases with a Series containing objects? Is there any way to access their attributes or apply functions to them?
I guess a standard way of managing that case would be to de-structure the the relevant fields of the object into a data frame, where each attribute is a column. Though there might be cases where someone would want to transform a collection of objects with map and filter(loc/where), without having to disassemble the complex type into a dataframe then immediately convert back.
I'm partly trying to find an alternative to the standard map()/filter() functions in python, where the operations have to be nested in reverse.
Ie,
map(function3, filter(function2, map(function1, collection)))