Pandas why index prohibits some operation on dataframe (like loc results in IndexingError:)

Question

There are similar issues asked about (like pandas: iterating over DataFrame index with loc) but I could not find mine. I just do not understand why in pandas many functionality stops working when index is added to dataframe. Like just now I have dataframe d1 and can do below to add new column:

df4 = df1
df4.loc[df1.crash_type_name == 'pedestrian','Pedestrian_type'] = 1

But the same fails for df2 - same one but index added with df2 = df1.set_index('date_time').

The error

IndexingError: (0 False 1 False ....

I know the workaround is to reset_index() and then to recreate it back, but what is the logic behind index prohibiting certain operations on dataframe?

`pandas` aligns on index when you slice it with a boolean Series. Once you change the `Index` the alignment no longer works. In this case, you can still slice with the array, `(df1.crash_type_name == 'pedestrian').to_numpy()` but now there is no guarantee of alignment, so it's prone to errors. — ALollz, Sep 09 '19 at 14:16
@ALollz, but original `df1` also shows index - default 0-1-2-etc (from reset_index() = "This resets the index to the default integer index". Why it works with default, but not with other column? — Alex Martian, Sep 09 '19 at 14:20
Because your original `DataFrame` likely also has a `RangeIndex`, so when you `reset_index` they will align. But again, this alignment is now only accidental and if there are intermediate steps that shuffle rows, this can lead to errors. — ALollz, Sep 09 '19 at 14:23

Pandas why index prohibits some operation on dataframe (like loc results in IndexingError:)

0 Answers0