0

I want to create a dictionary of full names that is sorted by last name alphabetically. The output would look something like...

{'A': ['V. Aakalu',
       'J.L. Accini',
       'Kimberly A. Aeling',
       'Konstantin Afanaciev',
       'T. Afzar',
       'Heidi Agnic'],

'B': ['Nicholas P. Breznay',
       'B. Breznock',
       'Rebecca Brincks',
       'M.A. Brito Sanfiel',
       'Reinhard Brunmeir']
....
}

until all the names are read and put into their respective sub_lists in the dictionary.

I've tried creating a new dictionary that includes all the English alphabet as keys, and if the last name's first letter matched one of the keys, I would append the full name to a list that is a value of the key. However, the output I get is all the names in a list for that particular key.

Example output:

{'A': ['V. Aakalu',
       'J.L. Accini',
       'Kimberly A. Aeling',
       'Konstantin Afanaciev',
       'T. Afzar',
       'Heidi Agnic'...
       'Nicholas P. Breznay',
       'B. Breznock',
       'Rebecca Brincks',
       'M.A. Brito Sanfiel',
       'Reinhard Brunmeir'],

'B': ['V. Aakalu',
       'J.L. Accini',
       'Kimberly A. Aeling',
       'Konstantin Afanaciev',
       'T. Afzar',
       'Heidi Agnic'...
       'Nicholas P. Breznay',
       'B. Breznock',
       'Rebecca Brincks',
       'M.A. Brito Sanfiel',
       'Reinhard Brunmeir'],

'C':......
}

I've also tried using the built-in function .update(), but all previous iterated names would be overwritten. The output I would get looks something like this:

{'A': 'M. Azizad',

 'B': 'M. Bänninger',

 'C': 'S. Czempiel',

 'D': 'S. D�\xadas-Mondragón',
}

My question is what is the best way for me to separate the names into their respective sub-lists? Thank you in advance!

Some of my code:

sorted_main_db = main_db.sort_values(by="auth_surname")

sorted_main_dict = sorted_main_db.to_dict()

norm_dict = dict.fromkeys(string.ascii_uppercase, [])

unnorm_dict = {}

for key, value in sorted_main_dict.items(): #for key value in sorted main dictionary
    
    for i in value: #iterator in dictionary values
        
        if 'auth_name' in key: #focus on the author's name
            
            if sorted_main_dict['auth_surname'][i] == None or sorted_main_dict['auth_surname'][i][0] not in norm_dict: #accounts for null and letters not in English alphabet

                unnorm_dict.update({key: value})

            if sorted_main_dict['auth_surname'][i][0] in norm_dict.keys(): #if the first letter of last name matches one of the keys

                norm_dict[sorted_main_dict['auth_surname'][i][0]].append(sorted_main_dict['auth_name'][i]) #append that name to the dictionary
           
Nitish
  • 392
  • 2
  • 7
BBB
  • 13
  • 1
  • Have a look at this question: https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby – Nitish Aug 05 '22 at 03:42

2 Answers2

0

Assuming you are using pandas, Here is a quick code for you:

df['l_name_intial'] = df.name.apply(lambda x: x.split(" ")[-1][0])
df2 = df.groupby('l_name_intial')['name'].apply(list)
print(df2)

Which results in:

l_name_intial
A    [V. Aakalu, J.L. Accini, Kimberly A. Aeling, K...
B    [Nicholas P. Breznay, B. Breznock, Rebecca Bri...
S                                 [M.A. Brito Sanfiel]

Basically you separate the last name initial letter to a separate column. Then you use group by to group them in a list.

Nitish
  • 392
  • 2
  • 7
0

it's quite easy, but the real task here is to extract last names correctly, for your provided example it could looks like this:

names = ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev',
         'T. Afzar', 'Heidi Agnic', 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 
         'M.A. Brito Sanfiel', 'Reinhard Brunmeir']

s = pd.Series(names)
s.groupby(s.str.extract(r'^.+?\.? ([A-Z])[^\.]',expand=False)).apply(list).to_dict()
                        #^^^^^^^^^^^^^^^^^^^^ extracts the first letter of a last name
>>> out
'''
{'A': ['V. Aakalu',
       'J.L. Accini',
       'Kimberly A. Aeling',
       'Konstantin Afanaciev',
       'T. Afzar',
       'Heidi Agnic'],
 'B': ['Nicholas P. Breznay',
       'B. Breznock',
       'Rebecca Brincks',
       'M.A. Brito Sanfiel',
       'Reinhard Brunmeir']}
SergFSM
  • 1,419
  • 1
  • 4
  • 7