0

I want to count the number of round trip (day trader) transactions in a set of stock trading transactions held in a pandas dataframe. My data originally has 3 columns: ['ticker', 'tradeDate', 'quantity']. I want to aggregate quantity based on the total positive quantity values.

My input data:

df = pd.DataFrame({
    'tradeDate':["12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020"],
    'ticker':["A2", "A1", "A1", "A2", "A1", "A1", "A1", "A2", "A2", "A2", "A1", "A2", "A1", "A2", "A1"],
    'quantity':[-3, -2, 1, 3, 1, -1, 1, -2, 1, -5, -4, 5, 2, 1, 2]
    },columns=['tradeDate','ticker', 'quantity']) 
# tradeDate ticker quantity
0 12/28/2020 A2 -3
1 12/28/2020 A1 -2
2 12/28/2020 A1 1
3 12/28/2020 A2 3
4 12/28/2020 A1 1
5 12/28/2020 A1 -1
6 12/28/2020 A1 1
7 12/28/2020 A2 -2
8 12/27/2020 A2 1
9 12/27/2020 A2 -5
10 12/27/2020 A1 -4
11 12/27/2020 A2 5
12 12/27/2020 A1 2
13 12/27/2020 A2 1
14 12/27/2020 A1 2

Note that index 7 is sold from long term holding and excluded from output. Note that index 8 is purchased for long term holding and excluded from output. Note that index 13 is purchased for long term holding and excluded from output.

My expected output: roundTripCount = 5

My expected output df:

# tradeDate ticker quantity
0 12/28/2020 A2 -3
1 12/28/2020 A2 3
2 12/28/2020 A1 -2
3 12/28/2020 A1 2
4 12/28/2020 A1 -1
5 12/28/2020 A1 1
6 12/27/2020 A2 -5
7 12/27/2020 A2 5
8 12/27/2020 A1 -4
9 12/27/2020 A1 4

I have read through this question and the linked dupes, and it doesn't quite answer how to go about this. Counting the amount of times a boolean goes from True to False in a column

I have tried making a column "q_bool" that is True if the quantity is > 0 and then used a slice by the ticker to check for changes in the q_bool, but it throws errors when including dates.

My current working solution is:

    df['q_bool'] = (df['quantity'] > 0)
    ticker_set = set(df['ticker'])
    running_round_trips_counted = 0
    for ticker in ticker_set:
        ticker_mask = (df['ticker'] == ticker)
        temp_df_slice = df[ticker_mask]
        date_set = set(temp_df_slice['tradeDate'])
        for d in date_set:
            d_mask = (temp_df_slice['tradeDate'] == d)
            d_temp_df_slice = temp_df_slice[d_mask]
            d_temp_df_slice = d_temp_df_slice.groupby([d_temp_df_slice['q_bool'].ne(df['q_bool'].shift()).cumsum(), 'q_bool']).size()
            half_of_transactions = math.trunc(len(d_temp_df_slice)/2)
            running_round_trips_counted = running_round_trips_counted + half_of_transactions

    print(running_round_trips_counted)

the output is 6 round trips.

The above code feels hacky and hard to follow. I did the following:

  1. make a slice by 'tradeDate' and slice it again by 'ticker' then
  2. count the changes in q_bool and make it into a new temp_multiindex_df
  3. subtract 1 from temp_multiindex_df if df_slice row count is odd
  4. add changes_in_qbool to a cumulative_round_trip_count variable, and
  5. use for loops to make slices and run through all the dates and tickers in my df.

Problems with my current working solution:

  1. double counts if I purchase to hold long-term at the beginning and ending of the day.
  2. this solution does not output a summary day trader df.

Is there a more pythonic way to do this?

5zorro
  • 31
  • 3
  • can you specify. what is it actually want to do refering to the df – sadbro Dec 29 '20 at 06:32
  • @sadbro this is a part of a trading engine. The engine must not allow any new "dayTrader" type trades due to United States Regulations if the account value is less than 25,000 and the day_trader_transaction_count is equal to 4. A day trader transaction is defined as a purchase and a sale of a security within a day. The above question attempts to calculate today's round_trip_transactions aka dayTrader transactions without undercounting and without overcounting. The current solution errs on the side of overcounting. The df output is for a dashboard that graphs leverage and algo return – 5zorro Jan 01 '21 at 23:50
  • can you please tell me your intention to do with quantity values(positive or negativer) – sadbro Jan 02 '21 at 12:13

0 Answers0