0

I am working on NLP project. I have extracted keywords from Resume and stored them in the list. The other list consists of all technical keywords which I have extracted from JSON. Both the lists consist of many keywords and below is just for reference.

list_of_keys=['azure', 'job', 'matlab', 'javascript', 'http', 'android', 'amazon', 'apache spark']

result=['apache http server', 'angularjs', 'azure bot service', 'amazon s3', 'android sdk', 'android studio', 'amazon cloudfront']

Code:

with open('rawtext.json','r', encoding='utf-8') as f:
    data = json.load(f)
result = [x["name"].replace("@", " ").lower() for x in data]
print(result)

print ("List of Matched Keywords are:\n")
# Comparing Lists

for item in list_of_keys: 
    for item1 in result: 
        if item == item1: 
            print("Word from Resume: ", item, ", Word from JSON data: ", item1)
print ("****************\n")

Current Output

Word from Resume: box , Word from JSON data: box Word from Resume: arduino , Word from JSON data: arduino Word from Resume: arduino , Word from JSON data: arduino Word from Resume: browser , Word from JSON data: browser Word from Resume: black , Word from JSON data: black Word from Resume: address , Word from JSON data: address Word from Resume: address , Word from JSON data: address

I have tried above a very simple technique by comparing two lists that just matches exact words and prints them. However, what I want is if there is any match in two lists e.g if 'apache spark' gets matched with result list 'apache http server' then it should print as an output: Word from Resume: apache spark, Word from JSON data: apache http server. Similarly, if amazon is matched then it should print as an output: Word from Resume: amazon, Word from JSON data: amazon s3, amazon cloudfront

Required Output:

Word from Resume: apache spark, Word from JSON data: apache http server Word from Resume: amazon, Word from JSON data: amazon s3, amazon cloudfront Word from Resume: http, Word from JSON data: apache http server

Can someone please help me out. Thank you.

bitnahian
  • 516
  • 5
  • 17
Joseph
  • 29
  • 6

2 Answers2

0

Maybe try this:

common = list(set(list_of_keys) & set(result))

For instance:

list_of_keys = ['one','two','three','some more']
result = ['two','some more']

common = list(set(list_of_keys) & set(result))

print (common)

Output:

['two', 'some more']
Synthase
  • 5,849
  • 2
  • 12
  • 34
0

I think what you're trying to achieve is a bit different to a simple equality check, i.e. 'azure' == 'azure bot service' will always return False.

The comparison check can be more sophisticated, but from your expected behaviour, I believe you're looking for this:

from collections import defaultdict

res_dict = defaultdict(list)
for item in list_of_keys: 
    for item1 in result: 
        if item in item1:
            res_dict[item].append(item1)

for k,v in res_dict.items():
    print("Word from Resume: ", k, ", Word from JSON data: ", ",".join(v))
print ("****************\n")

I've replaced the = check with the in check, which means that the comparison will return true if azure occurs inside azure bot service but will return false for all the other strings from the results array.

I would also suggest looking at Does Python have a string 'contains' substring method? for more complex substring matches since you're probably looking to check if words co-occur between your list_of_keys and results array.

Alternatively, you can also look at fuzzy search since it seems very close to your intended behaviour https://pypi.org/project/fuzzysearch/

bitnahian
  • 516
  • 5
  • 17
  • It is printing this: Word from Resume: manage , Word from JSON data: adobe experience manager Word from Resume: manage , Word from JSON data: aws key management service Word from Resume: manage , Word from JSON data: aws secrets manager Word from Resume: manage , Word from JSON data: aws certificate manager – Joseph Jan 04 '21 at 00:53
  • I am looking for fuzzy search. May be this will help me. – Joseph Jan 04 '21 at 00:53
  • I was about to suggest fuzzy search. Good call. Please upvote the answer if it helped. I've edited my answer to include a link to a fuzzysearch library. – bitnahian Jan 04 '21 at 00:56
  • Sure, I will upvote. Could you please assist me in integrating the fuzzy search logic. – Joseph Jan 04 '21 at 01:01
  • I think the following link should suffice to perform the check you want. https://towardsdatascience.com/fuzzy-string-matching-in-python-68f240d910fe Just settle on a ratio or partial_ratio value that would be sound enough for your use case. – bitnahian Jan 04 '21 at 01:08