Question
What is the correct or best way to query a pandas DataFrame
?
Is it depending on the use case or can you say "always use .query()" or "never use .query()"?
My primary concern is robustness or error-proof-ness of the code, but of course performance is also relevant.
In this post the query method is stated to be robust and preferred over the other methods, do you agree? Should I always use .query()?
DataFrame.query() function in pandas is one of the robust methods to filter the rows of a pandas DataFrame object.
And it is preferable to use the DataFrame.query() function to select or filter the rows of the pandas DataFrame object instead of the traditional and the commonly used indexing method.
Background
I recently came across the .query() method and started to use it quite frequently for convenience and because I thought this was the way to do it properly.
Then I read these two posts (the content is not essential for this question, I just want to show what made me think about it):
apply, the Convenience Function you Never Needed
and
How to deal with SettingWithCopyWarning in Pandas?
In the post about SettingWithCopyWarning
different methods like .loc and .at are mentioned, but not .query(). This made me wonder whether .query() is really used. (I thought I start a new question rather than posting this in the comments). It might also not have been relevant for that specific problem, but it made me wonder none the less.
The post about "apply - the convenience function..." made me wonder whether .query() is also a convenience function you never need.
The documentation mentions the following use case:
query() Use Cases
A use case for query() is when you have a collection of DataFrame objects that have a subset of column names (or index levels/names) in common. You can pass the same query to both frames without having to specify which frame you’re interested in querying
Edit: fixed the link to .apply()
question.