I have a strange problem with np.where
. I first load a database called df
and create a duplicate of df
, df1
. I then use np.where
to make each value in df1
be 1 if the number in the cell is greater or equal to its mean (found in the DataFrame df_mean
) else make the cell equal to 0. I use a for loop to iterate over each column headers in df1
and through a list of mean values df_mean
. Here's my code:
#Load the data
df = pd.read_csv('F:\\file.csv')
df.head(2)
>>> A AA AAP AAPL ABC
2011-01-10 09:30:00 -0.000546 0.006528 -0.001051 0.034593 -0.000095 ...
2011-01-10 09:30:10 -0.000256 0.007705 -0.001134 0.008578 -0.000549 ...
# Show list file with columns average
>>> df_mean.head(4)
A 0.000656
AA 0.002068
AAP 0.001134
AAPL 0.001728
...
df_1 = df
for x in list:
df_1[x] = np.where(df_1[x] >= *df_mean[x], 1, 0)
>>> df_1.head(4) #Which is my desired output (but which also makes df = df_1...WHY?)
A AA AAP AAPL ABC
2011-01-10 09:30:00 0 1 0 1 0 ...
2011-01-10 09:30:10 0 1 0 1 0 ...
2011-01-10 09:30:20 0 0 0 1 0 ...
2011-01-10 09:30:30 0 0 0 1 1 ...
Now, I get what I want which is a binary 1/0 matrix for df_1
, but it turns that df
also gets into a binary matrix (same as df_1
). WHY? The loop does not incorporate df
...