2

This is more of a newbie python question. I have a pandas dataframe tmp_df, which I slice using 3 datetime inputs as follows to extract different time ranges of data:

tmp_daily_df = tmp_df.loc[idx[daily[1]:daily[2]],:]
tmp_weekly_df = tmp_df.loc[idx[weekly[1]: weekly[2]],:]
tmp_monthly_df = tmp_df.loc[idx[monthly[1]: monthly[2]],:]

Then I pass the resulting 3 dataframes to a function called compute_stats(), which calculates various statistics and performs some manipulations to the input dataframe (i.e. tmp_daily_df). One such manipulation is adding several new columns to tmp_daily_df etc.

final_daily_df = compute_stats(tmp_daily_df, 'M','').reset_index(drop=True)
final_weekly_df = compute_stats(tmp_weekly_df, 'M','').reset_index(drop=True)
final_monthly_df = compute_stats(tmp_monthly_df, 'M','').reset_index(drop=True)

My question is since python variable assignment operates more like a linkage than a copy I'm wondering will the 2nd and 3rd calls to compute_stats be corrupted by manipulations to tmp_daily_df, which is a time slice of tmp_df which is referenced by tmp_weekly_df and tmp_monthly_df.

styvane
  • 59,869
  • 19
  • 150
  • 156
codingknob
  • 11,108
  • 25
  • 89
  • 126

1 Answers1

1

Slicing a list creates a copy, in other words :

new_list = l[:]

is equivalent to :

new_list = list(l)

DataFrames work a bit differently though. Take a look at this post : dataframes copies vs views

DataFrame.loc will return a view when used with scalar indexing/slicing.

According to this :

Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

you will get a view unless you use an array of labels or a boolean vector. Using the copy method would give you the desired result.

Community
  • 1
  • 1
Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75