2

EDIT: A suggested possible duplicate (this question) is not a duplicate. I'm asking if a slice of a dataframe can be edited and have that slice affect the original dataframe. The "duplicate" Q/A suggested is just looking for an alternate to .loc. The simple answer to my original question appears to be, "no".

Original Question:

This question likely has a duplicate somewhere, but I couldn't find it. Also, I'm guessing what I'm about to ask isn't possible, but worth a shot.

I'm looking to be able to filter or mask a large dataframe, get a smaller dataframe for ease of coding, edit the smaller dataframe, and have it affect the larger dataframe.

So something like this:

df_full = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

df_part = df_full[df_full['a'] == 2]

df_part['b'] = 'Kentucky Fried Chicken'

print df_full

Would result in:

   a  b
0  1  4
1  2  Kentucky Fried Chicken
2  3  6

I'm well aware of the ability to use the .loc[row_indexer, col_indexer] functionality, but even with a mask variable as the row_indexer, it can be a little unwieldy for more complex purposes.

A little context - I'm loading large database tables into a dataframe and want to make many edits on a small slice of it. So the .loc[] gets tedious. Maybe I could filter out that small slice, edit it, then re-append to the original?

Any thoughts?

jpp
  • 159,742
  • 34
  • 281
  • 339
elPastor
  • 8,435
  • 11
  • 53
  • 81
  • Possible duplicate of [Pandas loc alternatives with conditions](https://stackoverflow.com/questions/43196084/pandas-loc-alternatives-with-conditions) – Georgy Mar 31 '18 at 19:19
  • 2
    @Gregory - thanks for pointing that one out. Didn't see it. That said, per my edit, I don't think it's the same thing as I'm asking. – elPastor Mar 31 '18 at 19:24
  • 1
    Also found this one, but no good answer there: https://stackoverflow.com/questions/39972778/how-to-create-a-view-of-dataframe-in-pandas – Georgy Mar 31 '18 at 19:40
  • 1
    @Gregory - That second one _is_ a duplicate... or I should say my question is a duplicate of that one. I would agree the OP for that question is asking the same thing as I am. Looks like the answer is no different! Thanks again. – elPastor Mar 31 '18 at 19:42

1 Answers1

2

Short answer

No. You don't want to play the game where you have to keep checking / guessing whether you are using a copy or a view of a dataframe.

Single update: the right way

.loc accessor is the way to go. There is nothing unwieldy about it, though it takes some getting used to.

However complex your criteria, if it boils down to Boolean arrays, .loc accessor is still often the right choice. You need to show an example where it is genuinely difficult to implement.

df_full = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

df_full.loc[df_full['a'] == 2, 'b'] = 'Kentucky Fried Chicken'

#    a                       b
# 0  1                       4
# 1  2  Kentucky Fried Chicken
# 2  3                       6

Single update: an alternative way

If you find .loc accessor difficult to implement, one alternative is numpy.where:

df_full['b'] = np.where(df_full['a'] == 2, 'Kentucky Fried Chicken', df_full['b'])

Multiple updates: for many conditions

pandas.cut, numpy.select or numpy.vectorize can be used to good effect to streamline your code. The usefulness of these will depend on the specific logic you are attempting to apply. The below question includes examples for each of these:

Numpy “where” with multiple conditions

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks for your post, but I was wondering if / hoping there was a way outside of the `.loc` accessor. – elPastor Mar 31 '18 at 19:10
  • 1
    @pshep123, sure there is. I added an example. But, really, you should show an example where you believe it is "unwieldy".. – jpp Mar 31 '18 at 19:11