I am creating a count function on subsets of Pandas DataFrame and intends to export a dictionary/spreadsheet data that consists only of the groupby criteria and the counting results.
In [1]: df = pd.DataFrame([[Buy, A, 123, NEW, 500, 20190101-09:00:00am], [Buy, A, 124, CXL, 500, 20190101-09:00:01am], [Buy, A, 125, NEW, 500, 20190101-09:00:03am], [Buy, A, 126, REPLACE, 300, 20190101-09:00:10am], [Buy, B, 210, NEW, 1000, 20190101-09:10:00am], [Sell, B, 345, NEW, 200, 20190101-09:00:00am], [Sell, C, 412, NEW, 100, 20190101-09:00:00am], [Sell, C, 413, NEW, 200, 20190101-09:01:00am], [Sell, C, 414, CXL, 50, 20190101-09:02:00am]], columns=['side', 'sender', 'id', 'type', ''quantity', 'receive_time'])
Out[1]:
side sender id type quantity receive_time
0 Buy A 123 NEW 500 20190101-09:00:00am
1 Buy A 124 CXL 500 20190101-09:00:01am
2 Buy A 125 NEW 500 20190101-09:00:03am
3 Buy A 126 REPLACE 300 20190101-09:00:10am
4 Buy B 210 NEW 1000 20190101-09:10:00am
5 Buy B 345 NEW 200 20190101-09:00:00am
6 Sell C 412 NEW 100 20190101-09:00:00am
7 Sell C 413 NEW 200 20190101-09:01:00am
8 Sell C 414 CXL 50 20190101-09:02:00am
The count function is as below (mydf is passed in as a subset of the dataframe):
def ordercount(mydf):
num = 0.0
if mydf.type == 'NEW':
num = num + mydf.qty
elif mydf.type == 'REPLACE':
num = mydf.qty
elif mydf.type == 'CXL':
num = num - mydf.qty
else:
pass
orderdict = dict.fromkeys([mydf.side, mydf.sender, mydf.id], num)
return orderdict
After reading the data from csv, I group it by some criteria and also sort by time:
df = pd.read_csv('xxxxxxxxx.csv, sep='|', header=0, engine='python', names=col_names)
sorted_df = df.groupby(['side', 'sender', 'id']).apply(lambda_df:_df.sort_values(by=['time']))
Then call the previously defined function on the sorted data:
print(sorted_df.agg(ordercount))
But the value error kept bumping up saying too many lines to call.
The function way of counting data may not be efficient but it is the most straightforward way that I can think of to match order types and count quantity accordingly. I expect the program to output a table where only side, sender, id and counted quantity are shown. Is there any way to achieve this? Thanks.
Expected output:
side sender total_order_num trade_date
0 Buy A 300 20190101
1 Buy B 1200 20190101
2 Sell C 250 20190101