I have a little challenge here and to be honest, I have absolutely no idea how to handle it.
I have this dataframe composed of 660,000 rows and about 50 columns. I need to filter this dataframe very frequently and retrieve the filtered dataframe as fast as possible (goal is to have a processing time <1second). I'd like to be able to run that locally on a laptop, therefore my "processing power" is limited.
I have multiple inputs to filter the dataframe, some are set manually (see input 1) some are retrieved from another script (see input 2, the other script is not included in the code here for simplicity).
I was hoping to simple filter through the dataset using df[(df.column == filtervalue)]
. However, it seems that the processing time is way too long.
Therefore, I am wondering whether there are some technics to optimize such processing time or if on the contrary the only way to optimize that is to go with a server that has a good CPU / Memory capacity?
Thanks for the help
import pandas as pd
df = pd.read_csv('xxxxxxxx', sep=";", dtype={"id": str,"dataset1": str,"dataset2":str,"myposition":str,"bet_1_preflop":float,"bet_2_preflop":float,"bet_3_preflop":float,"bet_1_flop":float,"bet_2_flop":float,
"bet_3_flop":float,"bet_1_turn":float ,"bet_2_turn":float,"bet_3_turn":float,"bet_1_river":float,"bet_2_river":float, "bet_3_river":float,
"myhand":str,"myposition":str,"cards_flop":str,"cards_turn":str,"cards_river":str,"action1_preflop":str," action2_preflop":str,
"action3_preflop":str,"action4_preflop":str, "action1_flop":str, "action2_flop":str, "action3_flop":str,"action4_flop":str,"action1_turn":str,
"action2_turn":str, "action3_turn":str, "action4_turn":str, "action1_river":str,"action2_river":str, "action3_river":str, "action4_river":str,
"action1_preflop_binary":'Int64', "action2_preflop_binary":'Int64', "action3_preflop_binary":'Int64', "action4_preflop_binary":'Int64',
"action1_flop_binary":'Int64',"action2_flop_binary":'Int64', "action3_flop_binary":'Int64', "action4_flop_binary":'Int64', "action1_turn_binary":'Int64',
"action2_turn_binary":'Int64', "action3_turn_binary":'Int64', "action4_turn_binary":'Int64',"action1_river_binary":'Int64', "action2_river_binary":'Int64',
"action3_river_binary":'Int64', "action4_river_binary":'Int64', "tiers":'Int64',"assorties":str,
"besthand_flop":str,"checker_flop":float,"handtype_flop":str,"topsuite_flop":'Int64',"topcolor_flop":'Int64',"besthand_turn":str,"checker_turn":float,"handtype_turn":str,
"topsuite_turn":'Int64',"topcolor_turn":'Int64',"besthand_river":str,"checker_river":float,"handtype_river":str,"topsuite_river":'Int64',"topcolor_river":'Int64'})
df = df.reset_index()
#Inputs for filters 1
myposition ="sb"
myhand = "ackc"
flop = "ad9d4h"
turn = "8d"
river = "th"
a1_preflop = "r"
a2_preflop = "r"
a3_preflop = "c"
a4_preflop = ""
a1_flop = "r"
a2_flop = "f"
a3_flop = ""
a4_flop = ""
a1_turn = ""
a2_turn = ""
a3_turn = ""
a4_turn = ""
a1_river = ""
a2_river = ""
a3_river = ""
a4_river = ""
#Inputs for filters 2 (from a different script)
tiers
assorties_status
best_allhands_flop[0]
best_allhands_flop[1]
best_allhands_flop[2]
highest_suite_flop
highest_color_flop
best_allhands_turn[0]
best_allhands_turn[1]
best_allhands_turn[2]
highest_suite_turn
highest_color_turn
best_allhands_river[0]
best_allhands_river[1]
best_allhands_river[2]
highest_suite_river
highest_color_river
#filtre_preflop_a1 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop)]
#filtre_preflop_a2 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop) & (df.action2_preflop == a2_preflop)]
#filtre_preflop_a3 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop) & (df.action2_preflop == a2_preflop) & (df.action3_preflop == a3_preflop)]
#filtre_preflop_a4 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop) & (df.action2_preflop == a2_preflop) & (df.action3_preflop == a3_preflop) & (df.action4_preflop == a4_preflop)]