How to sum a multiindexed pandas series based on changes in boolean column from true to false?

Question

I want to count the number of round trip (day trader) transactions in a set of stock trading transactions held in a pandas dataframe. My data originally has 3 columns: ['ticker', 'tradeDate', 'quantity']. I want to aggregate quantity based on the total positive quantity values.

My input data:

df = pd.DataFrame({
    'tradeDate':["12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020"],
    'ticker':["A2", "A1", "A1", "A2", "A1", "A1", "A1", "A2", "A2", "A2", "A1", "A2", "A1", "A2", "A1"],
    'quantity':[-3, -2, 1, 3, 1, -1, 1, -2, 1, -5, -4, 5, 2, 1, 2]
    },columns=['tradeDate','ticker', 'quantity'])

#	tradeDate	ticker	quantity
0	12/28/2020	A2	-3
1	12/28/2020	A1	-2
2	12/28/2020	A1	1
3	12/28/2020	A2	3
4	12/28/2020	A1	1
5	12/28/2020	A1	-1
6	12/28/2020	A1	1
7	12/28/2020	A2	-2
8	12/27/2020	A2	1
9	12/27/2020	A2	-5
10	12/27/2020	A1	-4
11	12/27/2020	A2	5
12	12/27/2020	A1	2
13	12/27/2020	A2	1
14	12/27/2020	A1	2

Note that index 7 is sold from long term holding and excluded from output. Note that index 8 is purchased for long term holding and excluded from output. Note that index 13 is purchased for long term holding and excluded from output.

My expected output: roundTripCount = 5

My expected output df:

#	tradeDate	ticker	quantity
0	12/28/2020	A2	-3
1	12/28/2020	A2	3
2	12/28/2020	A1	-2
3	12/28/2020	A1	2
4	12/28/2020	A1	-1
5	12/28/2020	A1	1
6	12/27/2020	A2	-5
7	12/27/2020	A2	5
8	12/27/2020	A1	-4
9	12/27/2020	A1	4

I have read through this question and the linked dupes, and it doesn't quite answer how to go about this. Counting the amount of times a boolean goes from True to False in a column

I have tried making a column "q_bool" that is True if the quantity is > 0 and then used a slice by the ticker to check for changes in the q_bool, but it throws errors when including dates.

My current working solution is:

    df['q_bool'] = (df['quantity'] > 0)
    ticker_set = set(df['ticker'])
    running_round_trips_counted = 0
    for ticker in ticker_set:
        ticker_mask = (df['ticker'] == ticker)
        temp_df_slice = df[ticker_mask]
        date_set = set(temp_df_slice['tradeDate'])
        for d in date_set:
            d_mask = (temp_df_slice['tradeDate'] == d)
            d_temp_df_slice = temp_df_slice[d_mask]
            d_temp_df_slice = d_temp_df_slice.groupby([d_temp_df_slice['q_bool'].ne(df['q_bool'].shift()).cumsum(), 'q_bool']).size()
            half_of_transactions = math.trunc(len(d_temp_df_slice)/2)
            running_round_trips_counted = running_round_trips_counted + half_of_transactions

    print(running_round_trips_counted)

the output is 6 round trips.

The above code feels hacky and hard to follow. I did the following:

make a slice by 'tradeDate' and slice it again by 'ticker' then
count the changes in q_bool and make it into a new temp_multiindex_df
subtract 1 from temp_multiindex_df if df_slice row count is odd
add changes_in_qbool to a cumulative_round_trip_count variable, and
use for loops to make slices and run through all the dates and tickers in my df.

Problems with my current working solution:

double counts if I purchase to hold long-term at the beginning and ending of the day.
this solution does not output a summary day trader df.

Is there a more pythonic way to do this?

can you specify. what is it actually want to do refering to the df — sadbro, Dec 29 '20 at 06:32
@sadbro this is a part of a trading engine. The engine must not allow any new "dayTrader" type trades due to United States Regulations if the account value is less than 25,000 and the day_trader_transaction_count is equal to 4. A day trader transaction is defined as a purchase and a sale of a security within a day. The above question attempts to calculate today's round_trip_transactions aka dayTrader transactions without undercounting and without overcounting. The current solution errs on the side of overcounting. The df output is for a dashboard that graphs leverage and algo return — 5zorro, Jan 01 '21 at 23:50
can you please tell me your intention to do with quantity values(positive or negativer) — sadbro, Jan 02 '21 at 12:13

How to sum a multiindexed pandas series based on changes in boolean column from true to false?

0 Answers0