I've got a dataframe with hundreds of columns and millions of rows. I need to conditionally replace the values of selected columns by another value. what is the most efficient way to do this, if I know the index or names of the columns that need to be changed?
example below:
df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })
ID1 ID2 Value1 Value2
0 0 0 0 1
1 1 1 1 0
2 2 2 6 2
3 3 0 0 3
4 4 4 4 0
5 5 0 7 4
6 6 5 0 5
I want the values of Value1,Value2,..., ValueN which are larger than 0 to be replaced by 1. Note that ID1, ID2, ..., IDN should be excluded.
Desired Output:
ID1 ID2 Value1 Value2
0 0 0 0 1
1 1 1 1 0
2 2 2 1 1
3 3 0 0 1
4 4 4 1 0
5 5 0 1 1
6 6 5 0 1
dataframe has hundreds of columns and millions of rows.... so I'd like to do this as computationally efficient as possible.