0

I have a pandas DataFrame, projthemes_df which contains three columns.

enter image description here

I want to subset it into a 2-column DataFrame. I've been using code like this because this is representative of the examples I see most often:

theme_by_code_df = projthemes_df[['code', 'name']]

This works.

enter image description here

There is duplication in the resulting DF.

When I tried

theme_by_code_df.drop_duplicates(inplace=True)

I got an error:

enter image description here

Apparently, the error is based on Returning a view versus a copy (although the link in the error message is incorrect).

The question:

I've been using

df2 = df1[['a', 'b', 'c']]

thinking I was getting a new DF in df2. OOps!

enter image description here

So, what's the best practice to ensure that I'm working with a DF I can safely modify?

I thought it would work to initialize an empty dataframe before doing the selection, but I got the same error with this code.

tmp = pd.DataFrame()
tmp = projthemes_df[['code', 'name']]
tmp.drop_duplicates(inplace=True)

Is this reasonable? Is there something simple/better?

tmp = pd.DataFrame(projthemes_df[['code', 'name']])
Vicki B
  • 544
  • 2
  • 9
  • 20

1 Answers1

3

Use the .copy() method. It will create a copy the data instead of giving you a slice of the original dataframe.

   tmp = projthemes_df[['code', 'name']].copy()
sebvargo
  • 613
  • 7
  • 10