2

This question is very similar to one I asked here:

Python Pandas SettingWithCopyWarning copies vs new objects

I'd like to understand how I can exclude records within a given dataframe (IE operate on the dataframe and not a view of it) while also having the option of applying additional operations on the results.

I'm struggling with understanding how Python is managing reference vs value assignment when operating on Pandas DataFrame objects. I'm working with a dataset that is in a Pandas Dataframe and I'd like to reduce the set down based on certain attribute values. I'd also like to apply additional operations on the results of this operation. The preferred method I'd like to use is the .query() method. Here is a simple example:

mydf = pd.DataFrame({'col1':['A','B','C'],
                 'col2':['x','y','z']})
mydf = mydf.query('col1 == \'A\'')

This will conceptually accomplish what I'm looking for; a reduction in the dataset I'm working with based on a query against it. The question I have is this:

"Is this the correct application of the query function or should I be doing something else if I have additional operations to perform on 'mydf'"?

I've read through this documentation but still don't understand what pitfalls to watch out for...

Community
  • 1
  • 1
Sevyns
  • 2,992
  • 5
  • 19
  • 23

1 Answers1

2

I think this is a right approach if you don't need the data that was filtered out (reduced). You can also chain your "additional operations" (which is pretty efficient) like this:

 mydf = mydf.query('col1 == "A"').func1(...).func2(...).func3(...)

Here is a link to the documentation with lots of examples of how to use the query() method

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Perfect - thanks MaxU. I almost just asked you directly in the comments of my last question ;). This topic (reference vs value assignment), in general is one that I've had the toughest time with in the world of Python. Appreciate the help! – Sevyns Aug 24 '16 at 19:24
  • 1
    @Sevyns, you are welcome! It's difficult to predict when pandas will make a copy of your data and when will it work over the "view" - if you want to understand it in detail you can take a look at the sources under `your_python_path/Lib/site-packages/pandas/` - it takes time, but it's fun... – MaxU - stand with Ukraine Aug 24 '16 at 19:29