Python / Pandas Dict to find closest match then end loop

Question

trying to apply this logic to the following DF

I have a df as follows

import pandas as pd
import numpy as pd

df = pd.read_csv('subjects.csv') 

Subjects
Media
information Media
Digital Media

I then try to map my subjects to a dict to output a validated corrected_subject

d = {'Media' : 'Film & Media',
'Information' : 'ICT',
'Digital' : 'ICT'}

df['subject_corrected'] = df['subjects'](lambda x: ', '.join([d[i] for i in d if i in x]))

Subjects           subject_corrected
Media              Film & Media
information Media  Film & Media, ICT
Digital Media      Film & Media, ICT

now using this loops through my DF giving me all matches where I want it to find the closest match and exit the loop. so Digital Media would be ICT and not Media

I have tried the following but it hasn't really boded well for me! for

for k,v in d.items():
    if k in df['subjects']:
        df['subject_corrected'] = d.values():

Subjects           subject_corrected
Media              Film & Media
information Media  ICT
Digital Media      ICT

I've had a look at quite a few similar posts but couldn't work this one out.

am I going around this the wrong way, shall I pass this into two lists/arrays and use an if statement to loop through any matches? also how is a dict different from a 2D Array.

Any help is appreciated.

Can you show the expected output? – harvpan Jul 09 '18 at 14:44 — harvpan, Jul 09 '18 at 14:44

harvpan · Accepted Answer · 2018-07-09T15:21:16.700

2

You can use:

df['Subjects'].apply(lambda x: ', '.join([d[i] for i in d if i in x])).str.split(', ').str[-1]

Output:

            Subjects      subject_corrected
0              Media      Film & Media
1  Information Media               ICT
2      Digital Media               ICT

You can directly achive the output via the below line of code as well, which simply takes the last element from list.

df['Subjects'].apply(lambda x: [d[i] for i in d if i in x][-1])

edited Jul 09 '18 at 15:21

answered Jul 09 '18 at 14:53

harvpan

8,571
2
18
36

Nice one, see if you can rid yourself of the apply here. – cs95 Jul 09 '18 at 14:59
Awesome! can you explain this for the layman ? additionally, when I run this it adds an extra space after the correct match, I guess I need to add in a .strip to remove the space? – Umar.H Jul 09 '18 at 15:05
@coldspeed There is not a 1-1 relationship between `d` and `Subjects`, can't think of a solution without apply. If you have one in mind, go ahead and post the answer. – harvpan Jul 09 '18 at 15:06
@Datanovice, Once you map the values from `d` with `, ` separated strings, the solution splits the strings with `, ` and fetches the last value from the list. Consider the edit and that would solve the extra space problem. – harvpan Jul 09 '18 at 15:08
1

Thanks! I've green ticked you as this solution worked. May I ask how the logic works here. Does this loop through the dict to find the first match? or does it find the last match? – Umar.H Jul 09 '18 at 15:26
1

It finds all the match in the list but we only keep the last match. [-1] says, first element from the end of the list. – harvpan Jul 09 '18 at 15:27
What a clever solution. I wish I could upvote you more. Thank you. Final Q, how would you tackle this problem? – Umar.H Jul 09 '18 at 15:29
1

Thank you. If possible, I would make a dict that have all the mappings and then use [.replace()](https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict). – harvpan Jul 09 '18 at 15:33

Python / Pandas Dict to find closest match then end loop

1 Answers1