0

I have a dictionary where I read an sql table:

df = {}
df['abc'] = pd.read_sql_table('abc', engine, schema_z2,
                                    columns = cols)

Now, I want to filter out data such that only those rows with values "R" and "P" from the art column are kept. This is what I tried after reading this code snippet somewhere.

df_merged = df['abc'][df['abc']['art'].isin(['R','P'])]
print(df_merged)

When I hover over df_merged in Visual Studio Code, it says that it is a dataframe, which is what I wanted. However, at the end when I run my code,df_merged is empty, even though it should have rows. I could be using a wrong syntax here: df['abc'][df['abc']['art'].isin(['R','P'])]but I am unable to identify how to change it.

A similar question How to filter Pandas dataframe using 'in' and 'not in' like in SQL

does not help because I am already using isin() and I am trying to filter values from a dictionary not a df initially.

and if I just do this:

df_merged =df['abc']['art_kennz'].isin(['R','P','SP','GP'])

df_merged shows a Series[_bool] type instead of Dataframe.


Edit:

I tried this with the following test data https://docs.google.com/spreadsheets/d/1cykNjViW_DacwWZNaIHWEh3E8OqxsVon/edit?usp=sharing&ouid=115380043465372211112&rtpof=true&sd=true:

import pandas as pd

df = {}
df['abc'] = pd.read_excel('./testing.xlsx')
print(df)

df_merged = df['abc'][df['abc']['art_kennz'].isin(['S','P','SP','GP'])]
df_merged.head()

and I get an empty dataset upon printing, which shouldn't be the case

x89
  • 2,798
  • 5
  • 46
  • 110
  • Do you have an example (~10 rows) of the dataframe? This would make it a lot easier to find out what is going on. – Brian Barbieri Oct 05 '21 at 09:27
  • Can you see thee updated qs? https://docs.google.com/spreadsheets/d/1cykNjViW_DacwWZNaIHWEh3E8OqxsVon/edit?usp=sharing&ouid=115380043465372211112&rtpof=true&sd=true @BrianBarbieri – x89 Oct 05 '21 at 11:43

3 Answers3

0

Try:

df_merged = df[df['art'].isin(['R','P'])]['abc']
gtomer
  • 5,643
  • 1
  • 10
  • 21
  • 1
    Did you try this? It gives me ```KeyError: 'art'``` Here's a sample data source https://docs.google.com/spreadsheets/u/1/d/1cykNjViW_DacwWZNaIHWEh3E8OqxsVon/edit?usp=sharing&ouid=115380043465372211112&rtpof=true&sd=true – x89 Oct 05 '21 at 11:08
  • Couldn't have tried it until you gave a sample. Please note that there is no column name 'art', but 'art-kennz' – gtomer Oct 05 '21 at 12:54
  • Ah yeah, but ofc I changed the names accordingly while testing. It's ```KeyError: 'art_kennz'``` Or did it work for you? – x89 Oct 05 '21 at 13:59
0

I seem to be able to filter out what you want just fine:

import pandas as pd
df = {}
df["abc"] = pd.DataFrame({"art": ["A", "B", "C", "P", "R", "S"], "bert": [1, 2, 3, 4, 5, 6]})
df_merged = df["abc"][df["abc"]["art"].isin(["R", "P"])]
df_merged
#{'abc':   art  bert
#3   P     4
#4   R     5}

df_merged shows a Series[_bool] type instead of Dataframe.

Should be correct, since the pandas.DataFrame.isin() method returns a bool array whose rows satisfy the condition, so you can easily filter using that:

df["abc"]["art"].isin(["R", "P"])
#3    True
#4    True
#Name: art, dtype: bool
JonnyRobbie
  • 526
  • 5
  • 16
  • Could it be different because I am reading an sql table instead of a csv dataframe? https://docs.google.com/spreadsheets/u/1/d/1cykNjViW_DacwWZNaIHWEh3E8OqxsVon/edit?usp=sharing&ouid=115380043465372211112&rtpof=true&sd=true Here's a sample data source – x89 Oct 05 '21 at 11:09
  • Can you see my edited qs? I get an empty df – x89 Oct 05 '21 at 11:40
0

When I use the following dict:

dfs = {}
dfs["abc"] = pd.DataFrame({
    "mandant" : [9,9,9,9],
    "fk_eakopf_posnr" : [552025046,552025047,552035009,552035009],
    "zeile" : [5,5,5,5],
    "art_kennz" : ["G","G","G","S"]
})

I preform the following process:

dfs["abc"][dfs["abc"]['art_kennz'].isin(['S','P','SP','GP'])]

Then I get the following output:

mandant fk_eakopf_posnr zeile   art_kennz
3   9   552035009   5   S

Is this what you want?

Brian Barbieri
  • 293
  • 2
  • 15
  • What is the end result that you want, that is unclear to me. I thought you wanted a dict with values that are filtered dataframes. – Brian Barbieri Oct 05 '21 at 12:17
  • Nope, I wanted the opposite. I am reading content into a dictionary and I want the end result in a df but as shown in the edited part of my question, the final df I get is empty – x89 Oct 05 '21 at 12:19
  • I have updated by comment, it looks like it all works? – Brian Barbieri Oct 05 '21 at 12:26
  • Yup, this is what I would want but I don't get this when I read the file instead of manually creating a df ```df['cgo_eafapo_t'] = pd.read_excel('./testing.xlsx')``` – x89 Oct 05 '21 at 12:30
  • Are you sure that the object you are putting in the dictionary is exactly the same as the created dataframe? Please inspect this thoroughly. Also I would not recommend using "df" as the variable name of a dictionary, since it is commonly used for dataframes and could confuse others. – Brian Barbieri Oct 05 '21 at 13:24
  • The object I am loading into the dictionary is basically this file https://docs.google.com/spreadsheets/u/1/d/1cykNjViW_DacwWZNaIHWEh3E8OqxsVon/edit?usp=sharing&ouid=115380043465372211112&rtpof=true&sd=true I don't understand why it wouldn't work if the dataframe is working – x89 Oct 05 '21 at 13:58
  • For the same excel sheet, if I check this, ```df_merged = df['cgo_eafapo_t'][df['cgo_eafapo_t']['zeile'].isin([5])]```it works. Maybe because all values in that zeile column are 5. However, the same thing doesn't work for the art_kennz column – x89 Oct 05 '21 at 14:05
  • Ah I figured it's because of extra spaces in the art column :( – x89 Oct 05 '21 at 17:55