Pandas / Numpy: Issues with np.where

Question

I have a strange problem with np.where. I first load a database called df and create a duplicate of df, df1. I then use np.where to make each value in df1 be 1 if the number in the cell is greater or equal to its mean (found in the DataFrame df_mean) else make the cell equal to 0. I use a for loop to iterate over each column headers in df1 and through a list of mean values df_mean. Here's my code:

#Load the data

df = pd.read_csv('F:\\file.csv')

df.head(2)
>>>                     A        AA       AAP      AAPL       ABC    
2011-01-10 09:30:00 -0.000546  0.006528 -0.001051  0.034593 -0.000095 ...
2011-01-10 09:30:10 -0.000256  0.007705 -0.001134  0.008578 -0.000549 ...

# Show list file with columns average

>>> df_mean.head(4)
A       0.000656
AA      0.002068
AAP     0.001134
AAPL    0.001728
...

df_1 = df
for x in list:
    df_1[x] = np.where(df_1[x] >= *df_mean[x], 1, 0)

>>> df_1.head(4) #Which is my desired output (but which also makes df = df_1...WHY?)
                     A  AA  AAP  AAPL  ABC    
2011-01-10 09:30:00  0   1    0     1    0 ...
2011-01-10 09:30:10  0   1    0     1    0 ...
2011-01-10 09:30:20  0   0    0     1    0 ...
2011-01-10 09:30:30  0   0    0     1    1 ...

Now, I get what I want which is a binary 1/0 matrix for df_1, but it turns that df also gets into a binary matrix (same as df_1). WHY? The loop does not incorporate df...

pls post an example that is cut-pastable including all variables — Jeff, May 22 '14 at 19:27
@DSM so I can't "replicate" `df` into another `DataFrame`? @Jeff I have updated my codes....thanks for the suggestions! — Plug4, May 22 '14 at 19:45
You can, just not as easy as with `df_1 = df`. Check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html#pandas.DataFrame.copy And maybe check up on basics about mutable/immutable types: http://stackoverflow.com/questions/8056130/immutable-vs-mutable-types-python — sebastian, May 22 '14 at 19:48
That's true for python in general btw. `a=b` just gives another name, `a`, to whatever object `b` refers to. — TomAugspurger, May 22 '14 at 19:50
@sebastian... I see with `pd.copy` it works... geez how come I didn't know this before. Sebastian, do you want to post an answer so I can approve it? Thanks all — Plug4, May 22 '14 at 20:09

score 1 · Accepted Answer · answered May 23 '14 at 01:05

Although this is not what you asked for, but my spidy sense tells me, you want to find some form of indicator, if a stock is currently over or underperforming in regard of "something" using the mean of this "something". Maybe try this:

S = pd.DataFrame(
    np.array([[1.2,3.4],[1.1,3.5],[1.4,3.3],[1.2,1.6]]),
    columns=["Stock A","Stock B"],
    index=pd.date_range("2014-01-01","2014-01-04",freq="D")
)

indicator = S > S.mean()
binary = indicator.astype("int")
print S
print indicator
print binary

This gives the output:

            Stock A  Stock B
2014-01-01      1.2      3.4
2014-01-02      1.1      3.5
2014-01-03      1.4      3.3
2014-01-04      1.2      1.6
[4 rows x 2 columns]

           Stock A Stock B
2014-01-01   False    True
2014-01-02   False    True
2014-01-03    True    True
2014-01-04   False   False
[4 rows x 2 columns]

            Stock A  Stock B
2014-01-01        0        1
2014-01-02        0        1
2014-01-03        1        1
2014-01-04        0        0
[4 rows x 2 columns]

While you are at it, you should probably look into pd.rolling_mean(S, n_periods_for_mean).

Pandas / Numpy: Issues with np.where

1 Answers1