I have a DataFrame
from the below csv
content
NAME,VENUE_CITY_NAME,EVENT_LANGUAGE,EVENT_GENRE
satya,Pune,Hindi,|COMEDY|DRAMA|
Amit,National Capital Region,English,|ACTION|ADVENTURE|SCI-FI|
satya,Mumbai,Hindi,|COMEDY|DRAMA|
atul,Bangalore,Tamil,|DRAMA|THRILLER|
atul,Pune,Others,|SPORTS|
alex,Hyderabad,Telugu,|ACTION|ROMANCE|THRILLER|
satya,Bangalore,Malayalam,|DRAMA|SUSPENSE|
dave,Hyderabad,Hindi,|COMEDY|
chris,Bangalore,Telugu,|ACTION|ROMANCE|THRILLER|
satya,Pune,Others,|SPORTS|
dave,Kanpur,Hindi,|COMEDY|DRAMA|
alex,Bangalore,Telugu,|COMEDY|ROMANCE|
amit,Bangalore,Telugu,|ACTION|ROMANCE|THRILLER|
atul,Chennai,Tamil,|COMEDY|ROMANCE|
dave,Bangalore,Telugu,|ACTION|ROMANCE|THRILLER|
alex,Pune,Others,|SPORTS|
chris,Hyderabad,Telugu,|DRAMA|ROMANCE|
satya,National Capital Region,Hindi,|ACTION|COMEDY|
dave,Pune,Others,|SPORTS|
amit,National Capital Region,Others,|SPORTS|
I have to filter the dataframe by levels(with multinodes)and using multiprocessing also
LEVEL_1 Filter by city (may be on multiple city in different root nodes)
LEVEL-2 Then on that dataframe filter by language(multiple child node)
LEVEL-3 FILTER BY GENRE VALUE
Ok I admit that, this can be done by procedural way filtering step by step.
But reason is My Actual Dataframe size is huge, I was asked to consider memory management(so multiprocessing/queueing),reduce processing time, script should be dynamic and generic(so classes and objects)...likewise so many challenges.
So i want to filter the main dataframe at first level(as there can be so many cities to filter so multiple nodes which should be handled by multiprocessing),
Then at second level 2 or multiple sub/child nodes can be found based on language filter condition.so after filtering i need to drop the main dataframe at level1.
At level 3 same should be done like level-2 and the resulted dataframe should be returned to a base by queueing mechanism.