I am attempting to pass a data frame through some commands (preparing a series of arguments for a function). However, when I assign a data frame to a different data frame, this assignment seems to work as equivalency. In other words, after the assignment of a data frame to a new one, all changes apply to the original one as well. What is a good way to preserve the original data frame in its original state, so that it can be re-assigned to other commands, for other changes.
Please see below for an example.
# Merge several dataframes
df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })
dfs = [df1, df2, df3, df4,df5]
df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)
df_final
ID eTIV Ear_Vol Nose Eye_Vol Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500
Assignment of the data frame to a different data frame and manipulations:
df = df_final
df_raw = df
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)
New data frame (as expected):
df_raw
ID eTIV Ear_Vol_Raw Nose Eye_Vol_Raw Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500
Original data frame, for some reason is altered as well (why does assignment alter the original here?):
df
ID eTIV Ear_Vol_Raw Nose Eye_Vol_Raw Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500