I was wondering how pandas handles memory usage in python? I was wondering more specifically how the memory is handled if I set a pandas dataframe query results to a variable. Behind the hood, would it just be some memory addresses to the original dataframe object or would I be cloning all of the data?
I'm afraid of memory ballooning out of control but I have a dataframe that has non-unique fields I can't index it by. It's incredibly slow to query and plot data from it using commands like df[(df[''] == x) & (df[''] == y)].
(They're both integer values in the rows. They're also not unique, hence the fact it returns multiple results.)
I'm very new to pandas anyway, but any insights as to how to handle a situation where I'm looking for the arrays of values where two conditions match would be great too. Right now I'm using an O(n) algorithm to loop through and index it because even that runs faster than the search queries when I need to access the data quickly. Watching my system take twenty seconds on a dataset of only 6,000 rows is foreboding.