I have a pandas DataFrame, projthemes_df
which contains three columns.
I want to subset it into a 2-column DataFrame. I've been using code like this because this is representative of the examples I see most often:
theme_by_code_df = projthemes_df[['code', 'name']]
This works.
There is duplication in the resulting DF.
When I tried
theme_by_code_df.drop_duplicates(inplace=True)
I got an error:
Apparently, the error is based on Returning a view versus a copy (although the link in the error message is incorrect).
The question:
I've been using
df2 = df1[['a', 'b', 'c']]
thinking I was getting a new DF in df2. OOps!
So, what's the best practice to ensure that I'm working with a DF I can safely modify?
I thought it would work to initialize an empty dataframe before doing the selection, but I got the same error with this code.
tmp = pd.DataFrame()
tmp = projthemes_df[['code', 'name']]
tmp.drop_duplicates(inplace=True)
Is this reasonable? Is there something simple/better?
tmp = pd.DataFrame(projthemes_df[['code', 'name']])