I want to count the number of round trip (day trader) transactions in a set of stock trading transactions held in a pandas dataframe. My data originally has 3 columns: ['ticker', 'tradeDate', 'quantity']. I want to aggregate quantity based on the total positive quantity values.
My input data:
df = pd.DataFrame({
'tradeDate':["12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/28/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020", "12/27/2020"],
'ticker':["A2", "A1", "A1", "A2", "A1", "A1", "A1", "A2", "A2", "A2", "A1", "A2", "A1", "A2", "A1"],
'quantity':[-3, -2, 1, 3, 1, -1, 1, -2, 1, -5, -4, 5, 2, 1, 2]
},columns=['tradeDate','ticker', 'quantity'])
# | tradeDate | ticker | quantity |
---|---|---|---|
0 | 12/28/2020 | A2 | -3 |
1 | 12/28/2020 | A1 | -2 |
2 | 12/28/2020 | A1 | 1 |
3 | 12/28/2020 | A2 | 3 |
4 | 12/28/2020 | A1 | 1 |
5 | 12/28/2020 | A1 | -1 |
6 | 12/28/2020 | A1 | 1 |
7 | 12/28/2020 | A2 | -2 |
8 | 12/27/2020 | A2 | 1 |
9 | 12/27/2020 | A2 | -5 |
10 | 12/27/2020 | A1 | -4 |
11 | 12/27/2020 | A2 | 5 |
12 | 12/27/2020 | A1 | 2 |
13 | 12/27/2020 | A2 | 1 |
14 | 12/27/2020 | A1 | 2 |
Note that index 7 is sold from long term holding and excluded from output. Note that index 8 is purchased for long term holding and excluded from output. Note that index 13 is purchased for long term holding and excluded from output.
My expected output: roundTripCount = 5
My expected output df:
# | tradeDate | ticker | quantity |
---|---|---|---|
0 | 12/28/2020 | A2 | -3 |
1 | 12/28/2020 | A2 | 3 |
2 | 12/28/2020 | A1 | -2 |
3 | 12/28/2020 | A1 | 2 |
4 | 12/28/2020 | A1 | -1 |
5 | 12/28/2020 | A1 | 1 |
6 | 12/27/2020 | A2 | -5 |
7 | 12/27/2020 | A2 | 5 |
8 | 12/27/2020 | A1 | -4 |
9 | 12/27/2020 | A1 | 4 |
I have read through this question and the linked dupes, and it doesn't quite answer how to go about this. Counting the amount of times a boolean goes from True to False in a column
I have tried making a column "q_bool" that is True if the quantity is > 0 and then used a slice by the ticker to check for changes in the q_bool, but it throws errors when including dates.
My current working solution is:
df['q_bool'] = (df['quantity'] > 0)
ticker_set = set(df['ticker'])
running_round_trips_counted = 0
for ticker in ticker_set:
ticker_mask = (df['ticker'] == ticker)
temp_df_slice = df[ticker_mask]
date_set = set(temp_df_slice['tradeDate'])
for d in date_set:
d_mask = (temp_df_slice['tradeDate'] == d)
d_temp_df_slice = temp_df_slice[d_mask]
d_temp_df_slice = d_temp_df_slice.groupby([d_temp_df_slice['q_bool'].ne(df['q_bool'].shift()).cumsum(), 'q_bool']).size()
half_of_transactions = math.trunc(len(d_temp_df_slice)/2)
running_round_trips_counted = running_round_trips_counted + half_of_transactions
print(running_round_trips_counted)
the output is 6 round trips.
The above code feels hacky and hard to follow. I did the following:
- make a slice by 'tradeDate' and slice it again by 'ticker' then
- count the changes in q_bool and make it into a new temp_multiindex_df
- subtract 1 from temp_multiindex_df if df_slice row count is odd
- add changes_in_qbool to a cumulative_round_trip_count variable, and
- use for loops to make slices and run through all the dates and tickers in my df.
Problems with my current working solution:
- double counts if I purchase to hold long-term at the beginning and ending of the day.
- this solution does not output a summary day trader df.
Is there a more pythonic way to do this?