Finding the common data from a table

Question

I am using Jupyter and am trying to find the common data and have it tell me how many times the Bull has been in the top 3 or bottom three

LowWeaningData = {}
LowWeightData = {}
LowP8Data = {}
LowRibData = {}
LowEmaData = {}
LowImfData = {}
LowGrowthData = {}

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)'])

LowWeightData = df_bulls.nsmallest(3,['Weight (kg)'])

LowP8Data = df_bulls.nsmallest(3,['P8'])

LowRibData = df_bulls.nsmallest(3,['RIB'])

LowEmaData = df_bulls.nsmallest(3,['EMA(cm)'])

LowImfData = df_bulls.nsmallest(3,['IMF'])

LowGrowthData = df_bulls.nsmallest(3,['Growth %'])

Which prints this

This list is of the Lowest Growth % Data 
   Bull         Sire  Dam  Weaning Weight(kg)  Weight (kg)  P8  RIB  EMA(cm)  \
5  S10  Black Magic  L16               522.0          818   7    6      124   
1  S24          P42  L11               469.0          774   7    6      116   
2  S32          P41   M6               401.0          662   6    5      105   

   IMF   Growth %  
5  6.3  56.704981  
1  5.6  65.031983  
2  4.3  65.087282

The next part is trying to find how many times the bull appeared in the list and which ones so i use this code

lower_elements_in_all = 
list(set.intersection(*map(set[LowWeaningData, LowWeightData, 
LowP8Data, LowRibData, LowEmaData, LowImfData, LowGrowthData])))

I keep getting this instead of the actual bull names

['Bull', 'Sire', 'P8', 'EMA(cm)', 'RIB', 'Growth %', 'Dam', 'Weaning Weight(kg)', 'IMF', 'Weight (kg)']

How i would like my data to return is as follows

S10 has appeared 3 times in category Low Weaning Data, Low P8 Data and Low Rib Data
S24 has appeared 3 times in etc
S32 has appeared 1 times in etc

in descending order so its easy to see

welcome to stackoverflow – please post any code in your question as formatted text and not an image (see [here](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors)). also it would be useful if we had a sample of the `df_bulls` dataframe so we can [reproduce](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) the behavior you're describing, you can by copy and paste the output from `df_bulls.head().to_dict()` directly into your question – thanks! — Derek O, Jan 10 '23 at 05:42
@DerekO i already have the top/bottom 3 in each category however i want to know how many times they appear in each category and have it list which category they appear in and then have it in a descending order — Jess, Jan 10 '23 at 06:08
what does your expected output look like? is it a total count of the number of times a bull appears in any of the low categories series you've defined? adding more details into your question (instead of the comments) will be helpful too because not everyone reads the comments and might miss important information — Derek O, Jan 10 '23 at 06:32
@DerekO ive updated my question of how i would like it to look. The only issue is if i try and find the common variables it will tell me the headings of the spreadsheet are common. ive tried looking at only df_bulls['Bull'] to only make it look there but it doesnt seem to work — Jess, Jan 10 '23 at 06:39

Derek O · Accepted Answer · 2023-01-10T09:21:44.147

I would start by adding a new column called "Category" to each of your dataframes with a string for the appropriate category. For example:

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)']).assign(category="Low Weaning Data")
LowWeightData = df_bulls.nsmallest(3,['Weight (kg)']).assign(category="Low Weight Data")
LowP8Data = df_bulls.nsmallest(3,['P8']).assign(category="Low P8 Data")
LowRibData = df_bulls.nsmallest(3,['RIB']).assign(category="Low Rib Data")
LowEmaData = df_bulls.nsmallest(3,['EMA(cm)']).assign(category="Low EMA Data")
LowImfData = df_bulls.nsmallest(3,['IMF']).assign(category="Low IMF Data")
LowGrowthData = df_bulls.nsmallest(3,['Growth %']).assign(category="Low Growth Data")

Then use pd.concat to combine all of these dataframes together:

low_bulls_by_category_df = pd.concat([
    LowWeaningData,
    LowWeightData,
    LowP8Data,
    LowRibData,
    LowEmaData,
    LowImfData,
    LowGrowthData
])[['Bull','Category']]

Using some data I made up, low_bulls_by_category_df should look something like the following (with some of the categories removed to simplify the example):

  Bull          Category
0  S10  Low Weaning Data
1  S11  Low Weaning Data
2  S12  Low Weaning Data
0  S10   Low Weight Data
1  S13   Low Weight Data
2  S14   Low Weight Data
0  S12       Low P8 Data
1  S11       Low P8 Data
2  S10       Low P8 Data

The reason we added a "Category" column to each of your smallest value DataFrames is so that when you combine all of these dataframes together, you still know which low category each bull came from.

Then we can loop through the portions of this dataframe for each unique bull using a groupby, and populate a dictionary with information about how many times each bull appears and what categories they appear in.

low_bull_info = {}
for bull,df_group in low_bulls_by_category_df.groupby("Bull"):
    low_bull_info[bull] = {'count':len(df_group), 'categories':df_group['Category'].tolist()}

Inside the dictionary we have all the information we need. And I believe pandas automatically groups by the number of occurrences, so the bulls appearing in largest number of categories will come first in your dictionary – which will be convenient when we want to print in descending order of occurrences later.

{
    'S10': 
        {'count': 3, 
         'categories': ['Low Weaning Data', 'Low Weight Data', 'Low P8 Data']}, 
     'S11': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S12': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S13': 
        {'count': 1, 
         'categories': ['Low Weight Data']}, 
     'S14': {'count': 1, 'categories': ['Low Weight Data']}
}

Then we can loop through this dictionary, and print out the information in a formatted string for each iteration of the loop:

for bull,bull_info in low_bull_info.items():
    count = bull_info['count']
    categories_str = ', '.join(bull_info['categories'])
    print(f"{bull} has appeared {count} times in category {categories_str}")

This results in the following output:

S10 has appeared 3 times in category Low Weaning Data, Low Weight Data, Low P8 Data
S11 has appeared 2 times in category Low Weaning Data, Low P8 Data
S12 has appeared 2 times in category Low Weaning Data, Low P8 Data
S13 has appeared 1 times in category Low Weight Data
S14 has appeared 1 times in category Low Weight Data

Thank you very much! This has been very helpful and has worked! I really appreciate the help and your answer — Jess, Jan 10 '23 at 22:18
@Jess you're welcome – i am glad to hear my answer was helpful! — Derek O, Jan 12 '23 at 09:47

Finding the common data from a table

1 Answers1