-1

I have imported a csv to pandas' data frame.

                         Work Product  Version
0  LCR_ContractualOutflowsMaster.aspx      1.1
1              LCR_CountryMaster.aspx      1.1
2          WBR_LCR_ContOutflowsMaster      1.0
3           USP_WBR_LCR_CountryMaster      1.0

Then then data frame was inserted in to a python dictionary.

{'LCR_ContractualOutflowsMaster.aspx': [1.1], 'LCR_CountryMaster.aspx': [1.1], 'WBR_LCR_ContOutflowsMaster': [1.0], 'USP_WBR_LCR_CountryMaster': [1.0]}

There are two keys which have common maximum value 1.1. Is there a way to print out these two keys into a list?

I have tried some methods such as (referred from some stack overflow queries)

1) max_value = max(csv_dict.items(), key=operator.itemgetter(1))[0]

2) max_value = max(csv_dict.items(), key=lambda x: x[1])[0]

3) max_value = max(csv_dict.values()); {key for key, value in csv_dict.items() if value == max_value}

4) max_value = max(csv_dict, key=csv_dict.get)

It is only printing one value.

Regards

rpanai
  • 12,515
  • 2
  • 42
  • 64
CK5
  • 1,055
  • 3
  • 16
  • 29
  • 3
    Why not use the DataFrame? `df.loc[df.Version == df.Version.max(), 'Work Product'].tolist()` – user3483203 Jul 23 '19 at 15:12
  • 1
    Duplicate of https://stackoverflow.com/questions/52588298/pandas-idxmax-return-all-rows-in-case-of-ties? – Finomnis Jul 23 '19 at 15:13
  • Is there a reason you do not want to do this in pandas directly, but rather insist on using a dict? – Nelewout Jul 23 '19 at 15:13
  • @user3483203 I am getting AttributeError: module 'pandas' has no attribute 'loc'. My pandas version is 0.25.0. – CK5 Jul 23 '19 at 15:22
  • after you have computed `max_value`, use list comprehensions: `keys = [k for (k,v) in csv_dict.items() if v == max_value]`. It should work. I'd compute `max_value` using your 2nd option: `max_value = max(csv_dict.items(), key=lambda x: x[1])[0]`. You cannot both find the max value and filter the entries in one iteration. It will have to be two separate iterations, still O(n) in terms of complexity. – SomethingSomething Jul 23 '19 at 15:30

1 Answers1

1

1- df['Version']==df['Version'].max() this actually fetch all records which contains maximum value of version.As you can see by dataframe 2 is a maximum value so the first two records would be fetched due to first line of code.

2- df['Work Product'].unique() this fetch unique work_product against maximum value of version

df = pd.DataFrame(data={"Work Product":["A","B","C","D"],
                       "Version":[2,2,1,1]})

df = df[df['Version'] ==df['Version'].max()]
uq_work_product = list(df['Work Product'].unique())
print(uq_work_product)
['A', 'B']
tawab_shakeel
  • 3,701
  • 10
  • 26
  • It has worked! But also explain what is happening behind the scenes in these two lines -------------------> df = df[df['Version'] ==df['Version'].max()] and uq_work_product = list(df['Work Product'].unique())? It will help readers like me who are new to pandas – CK5 Jul 23 '19 at 15:33
  • 1
    Got it mate! Good Explanation – CK5 Jul 23 '19 at 15:38
  • @KrisT great :) – tawab_shakeel Jul 23 '19 at 15:40