2

I have a dataframe that is consist of 3 columns when user select what are the columns and values that he want to apply a filter on it the system create a dictionary with these keys==columns name : values = cell values, I want to display the filtered dataframe as a table.

I want to compare between the dictionary and dataframe and display the filtered data.

code:

import pandas as pd
df =pd.DataFrame({
            "source_number":[11199,11328,11287,32345,12342,1232,13456,123244,13456],
             "location":["loc2","loc1","loc3","loc1","loc2","loc2","loc3","loc2","loc1"],
             "category":["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
             })  

#let say the created dictionary have the below value
sidebars = {"location":["loc1","loc2"],"category":["cat1","cat3"]}

excpected result :

source_number   location    category
 32345             loc1       cat3
 11199             loc2       cat1
 12342             loc2       cat3
 1232              loc2       cat3
 123244            loc2       cat3

code with streamlit:

import numpy as np
import pandas as pd
import streamlit as st 
    
    
df =pd.DataFrame({
            "source_number":                        [ 
             [11199,11328,11287,32345,12342,1232,13456,123244,13456],
             "location":          
             ["loc2","loc1","loc3","loc1","loc2","loc2","loc3","loc2","loc1"],
              "category": 
             ["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
             })  
    
    is_check = st.checkbox("Display Data")
    if is_check:
        st.table(df)
    
    
    columns = st.sidebar.multiselect("Enter the variables", df.columns)
    
    sidebars = {}
    for y in columns:
        ucolumns=list(df[y].unique())
    
        sidebars[y]=st.sidebar.multiselect('Filter '+y, ucolumns)   
    L = [df[k].isin(v) if isinstance(v, list) 
         else df[k].eq(v) 
         for k, v in sidebars.items() if k in df.columns]
    df = df[np.logical_and.reduce(L)]
    st.table(df)

How to get the excpected result i know that i need to iterate over the dictionary and compare

After i solved the comapring between dictionary and dataframe based on the answer of @jezrael.

it still display the below error at the first :

KeyError: True
Traceback:
File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script
    exec(code, module.__dict__)
File "F:\AIenv\streamlit\app.py", line 326, in <module>
    df = df[np.logical_and.reduce(L)]
File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
    indexer = self.columns.get_loc(key)
File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc
    raise KeyError(key) from err
DevLeb2022
  • 653
  • 11
  • 40

2 Answers2

0

Use isin and boolean indexing.

df_ = df[df['location'].isin(sidebars['location']) & df['category'].isin(sidebars['category'])]
# print(df_)

   source_number location category
0          11199     loc2     cat1
3          32345     loc1     cat3
4          12342     loc2     cat3
5           1232     loc2     cat3
7         123244     loc2     cat3

If you want to filter by user chosen column, you can use

from functools import reduce

user_choosen = ['location']

mask = reduce(lambda x, y: x & y, [df[col].isin(sidebars[col]) for col in user_choosen])

df_ = df[mask]
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • what if want to make the **df["location"]** dynamic ?? means that the user choose what will be the column name can i make it based on the index of the column?? or is there another way to make the filter dynamic? – DevLeb2022 Apr 16 '21 at 05:14
  • @user5980666 If user chooses what will be the column name, then assign user chosen to a variable, say `col`. You can use `df[col]`. – Ynjxsjmh Apr 16 '21 at 05:16
  • so maybe the user choose one columns not 2 and 1 or 2 values or 3 values ?? how this can be applied on my filter – DevLeb2022 Apr 16 '21 at 05:37
0

Use dict comprehension for select dynamic by columns names with values in lists by Series.isin with np.logical_and and reduce trick:

Notice - If use isin in dict all values has to be list

df = df[np.logical_and.reduce([df[k].isin(v) for k, v in sidebars.items()])]
print (df)
   source_number location category
0          11199     loc2     cat1
3          32345     loc1     cat3
4          12342     loc2     cat3
5           1232     loc2     cat3
7         123244     loc2     cat3

If possible scalars or lists in dict is possible use if-else in list comprehension with test scalars by Series.eq:

#let say the created dictionary have the below value
sidebars = {"location":["loc1","loc2"],"category":"cat3"}

L = [df[k].isin(v) if isinstance(v, list) else df[k].eq(v) for k, v in sidebars.items()]
df = df[np.logical_and.reduce(L)]
print (df)
   source_number location category
3          32345     loc1     cat3
4          12342     loc2     cat3
5           1232     loc2     cat3
7         123244     loc2     cat3

EDIT: If possible some column no match by keys of dict is possible filter it (but then not filtered by this not matched key):

L = [df[k].isin(v) for k, v in sidebars.items() if k in df.columns]
L = [df[k].isin(v) if isinstance(v, list) 
     else df[k].eq(v) 
     for k, v in sidebars.items() if k in df.columns]

df = df[np.logical_and.reduce(L)]

EDIT:

First time code in streamlit, so possible better solutions, here is problem if passed empty dictionary.

So possible check it by if bool(sidebars):

is_check = st.checkbox("Display Data")
if is_check:
    st.table(df)


columns = st.sidebar.multiselect("Enter the variables", df.columns)

sidebars = {}
for y in columns:
    ucolumns=list(df[y].unique())
    print (ucolumns)

    sidebars[y]=st.sidebar.multiselect('Filter '+y, ucolumns)   

if bool(sidebars):
    L = [df[k].isin(v) if isinstance(v, list) 
         else df[k].eq(v) 
         for k, v in sidebars.items() if k in df.columns]
    
    df1 = df[np.logical_and.reduce(L)]
    st.table(df1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • i used your answer but it display the below error: `KeyError: True Traceback: File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script exec(code, module.__dict__) File "F:\AIenv\streamlit\app.py", line 323, in df = df[np.logical_and.reduce(L)] File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__ indexer = self.columns.get_loc(key) File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc raise KeyError(key) from err` what it means? – DevLeb2022 Apr 16 '21 at 05:36
  • @user5980666 - I think there is no some column from keys of dict in DataFrame. – jezrael Apr 16 '21 at 05:37
  • @user5980666 - Like `sidebars = {"some col":["loc1","loc2"],"category":["cat1","cat3"]}` – jezrael Apr 16 '21 at 05:38
  • @user5980666 - It is possible filter ouput this not matched columns, added to answer. – jezrael Apr 16 '21 at 05:42
  • this solve partially my problem.. because when i run the code it display at the first this error: `KeyError: True Traceback: File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script exec(code, module.__dict__) File "F:\AIenv\streamlit\app.py", line 326, in df = df[np.logical_and.reduce(L)] File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__ indexer = self.columns.get_loc(key) File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc raise KeyError(key) from err` – DevLeb2022 Apr 16 '21 at 05:55
  • 1
    i will edit my question and add some more details about my code – DevLeb2022 Apr 16 '21 at 05:55
  • @user5980666 - What is error after `raise KeyError(key) from err` ? – jezrael Apr 16 '21 at 06:17
  • there is nothing after the line `raise KeyError(key) from err` – DevLeb2022 Apr 16 '21 at 06:21
  • @user5980666 - hmm, it seems error because empty dict is passed, because no default select – jezrael Apr 16 '21 at 06:31
  • @user5980666 - If check yout cmd there is first passed `{}` – jezrael Apr 16 '21 at 06:31
  • @user5980666 - Aded solution to answer. – jezrael Apr 16 '21 at 06:39
  • yes that was the problem thank you. can i relate the dropdownlists in my code to display just the values that are related to the first dropdownlist or the second ? means if **category** have **cat1** and **cat1** have **loc1** and **loc2** without **loc3** i do not want to display **loc3** as an option. – DevLeb2022 Apr 16 '21 at 06:50
  • @user5980666 - I can add some solution, but still I think better post new question, because not sure if exist something better – jezrael Apr 16 '21 at 06:56
  • ok i will post new question hope that you will contribute and thanks – DevLeb2022 Apr 16 '21 at 07:00