python pandas dataframe, operations on values

Question

I am trying to understand how Pandas DataFrames works to copy information downward, and then reset until the next variables changes... Specifically below, how do I make Share_Amt_To_Buy reset to 0 once my Signal or Signal_Diff switches from 1 to 0?

Using .cumsum() on Share_Amt_To_Buy ends up bringing down the values and accumulating which is not exactly what I would like to do.

My goal is that when Signal changes from 0 to 1, the Share_Amt_To_Buy is calculated and copied until Signal switches back to 0. Then if Signal turns to 1 again, I want Share_Amt_To_Buy to be recalculated based on that point in time.

Hopefully this makes sense - please let me know.

Signal  Signal_Diff  Share_Amt_To_Buy (Correctly)  Share_Amt_To_Buy (Currently)
0       0            0                             0
0       0            0                             0
0       0            0                             0
1       1            100                           100
1       0            100                           100
1       0            100                           100
0       -1           0                             100
0       0            0                             100
1       1            180                           280
1       0            180                           280

As you can see, my signals alternate from 0 to 1, and this means the following: 0 = no trade (or position) 1 = trade (with a position)

Signal_Diff is calculated as follows

portfolio['Signal_Diff'] = portfolio['Signal'].diff().fillna(0.0)

The column 'Share_Amt_To_Buy' is calculated when signal changes from 0 to 1. I have used the following as an example to calculate this

initial_cap = 100000.0
portfolio['close'] = my stock's closing prices as a float
portfolio['Share_Amt'] = np.where(variables['Signal']== 1.0, np.round(initial_cap / portfolio['close'] *     0.25 * portfolio['Signal']), 0.0).cumsum()

portfolio['Share_Amt_To_Buy'] = (portfolio['Share_Amt']*portfolio['Signal'])

score 0 · Answer 1 · edited May 23 '17 at 10:28

0

From what I understand, there is no built-in formula module for pandas. You can perform formulas on columns, cells, arrays and generate different arrays or values from them (df[column].count() is an example), and do plenty of work like that, but there is no method for dynamically updating the array itself based on another value in the array (like an Excel formula).

You could always do the procedure iteratively and say:

>>> for index in df.index:
>>>     if df['Signal_Diff'] == 0:
>>>         df.loc[index, 'Signal_Diff'] = some_value
>>>     elif df['Signal_Diff'] == 1:
>>>         df.loc[index, 'Signal_Diff'] = some_other_value

Or you could create a custom function via the map tool: https://stackoverflow.com/a/19226745/4131059

EDIT:

Another solution would be to query for all indexes with a value of 1 in the old array and the new array upon some change to the array:

>>> df_old_list = df[df.Signal_Diff == 1].index.tolist()
>>> ...
>>> df_new_list = df[df.Signal_Diff == 1].index.tolist()
>>>
>>> for x in df_old_list:
>>>     if x in df_new_list:
>>>         df_new_list.remove(x)

Then recalculate for only the indexes in df_new_list.

edited May 23 '17 at 10:28

Community

1
1

answered Oct 23 '14 at 00:49

Alex Huszagh

13,272
3
39
67

i like the first option with your for loop... to make sure I understand, you're using a for loop to examine my signals series and fill in values calculated in my share_amt series, to create a 3rd series and this is iterated row by row? – antonio_zeus Oct 28 '14 at 00:27
Yeah, I'm using a for loop to see which values have changed, and then using it to recalculate the share_amt series. Since iterating through and recalculating every value for every index would take forever, quickly iterating through a list and then checking for only the indexes that have changed would be faster. – Alex Huszagh Oct 28 '14 at 01:52
For example, checking all the values of one column to see if they are equal to a certain value takes ~1.8 to 18.8 ms for a 10,000 column dataframe (so 2x that, upper limit is if all are true, lower value is if all are false), while then removing the values takes 2.35 ms-664 ms (lower for if all false), and recalculating takes 100 μs, while redefining each values takes ~120 μs per index, so you're faster assuming less then ~2/3rd of the values change (~5 ms if no values change, 1.6 s if all change vs. 1 s for checking everything). – Alex Huszagh Oct 28 '14 at 01:56

python pandas dataframe, operations on values

1 Answers1