Get the key string matching a format from a dict - python

Question

My list has the following dictionaries.

[OrderedDict([('Employee Number', '1'), ('Employee Name', 'Ms. A'), ('RMG SPOC', 'X'), ('Total Experience (yrs)', '3.06'), ('Days Unallocated', '18'), ('Skill Details', 'Manual testing'), ('Contact Number', '1234')]), OrderedDict([('Employee Number', '2'), ('Employee Name', 'Mr. B'), ('RMG SPOC', 'Y'), ('Total Experience (yrs)', '2.51'), ('Days Unallocated', '28'), ('Skill Details', 'Manual Testing'), ('Contact Number', '2345')]), OrderedDict([('Employee Number', '3'), ('Employee Name', 'Mr. C'), ('RMG SPOC', 'Z'), ('Total Experience (yrs)', '1.86'), ('Days Unallocated', '9'), ('Skill Details', 'C++, Manual Testing, Oracle'), ('Contact Number', '4567')]), OrderedDict([('Employee Number', '4'), ('Employee Name', 'Mr. D'), ('RMG SPOC', 'xyz'), ('Total Experience (yrs)', '7.68'), ('Days Unallocated', '23'), ('Skill Details', 'Manual Testing, SQL, HCM'), ('Contact Number', '789')])]

I am pushing these values to database by preparing a dictionary using for loop on the above data.

emp_data = {"employee_name" : data['Employee Name'],
            "employee_number" : data['Employee Number'],
            "date_added" : datetime.datetime.now(),
            "rmg_spoc" : data['RMG SPOC'],
            "status" : "To be evaluated",
            "total_experience" : data['Total Experience (yrs)'],
            "days_unallocated" : data['Days Unallocated'],
            "skill_details" : data['Skill Details'],
            "contact_number" : data['Contact Number'],
            "reviewer" : "To be assigned",
            "comments" : "To be added"}

I get the original data from a excel/csv. This is working fine as long as the keys match the data in the provided excel/csv.

In case, the excel/csv has 'Employee Name' as 'employee name' or 'EMPLOYEE NAME', then the above way will not work.

Is there a way in which I can handle this, such that keys like "employee name" is mapped to value matching any of the format('Employee Name', 'EMPLOYEE NAME', 'employee name'), "rmg_spoc" matching any of the format('RMG', 'rmg', 'RMG SPOC', 'rmg spoc', 'rmg*'), "total_experience" matching any of the format('Total Experience', 'total experience', '* [E][e]xperience *').

Usually a combination of `'Employee Name'.strip().lower()` will get it to match `employee name` — C.Nivs, Feb 25 '19 at 19:14

score 1 · Accepted Answer · answered Feb 25 '19 at 19:28

This seems like an example of a repeated problem of case insensitive dictionary search (This SO question and all its duplicates)

The proposed solution from the post is to use a wrapper to dict (or collections.OrderedDict) like so:

import collections

class CaseInsensitiveDict(collections.Mapping):
    def __init__(self, d):
        self._d = d
        self._s = dict((k.lower(), k) for k in d)
    def __contains__(self, k):
        return k.lower() in self._s
    def __len__(self):
        return len(self._s)
    def __iter__(self):
        return iter(self._s)
    def __getitem__(self, k):
        return self._d[self._s[k.lower()]]
    def actual_key_case(self, k):
        return self._s.get(k.lower())

And in your code you just wrap your dicts with this wrapper so you can perform case insensitive key search:

data_items = [OrderedDict([('Employee Number', '1'), ('Employee Name', 'Ms. A'), ('RMG SPOC', 'X'), ('Total Experience (yrs)', '3.06'), ('Days Unallocated', '18'), ('Skill Details', 'Manual testing'), ('Contact Number', '1234')]), OrderedDict([('Employee Number', '2'), ('Employee Name', 'Mr. B'), ('RMG SPOC', 'Y'), ('Total Experience (yrs)', '2.51'), ('Days Unallocated', '28'), ('Skill Details', 'Manual Testing'), ('Contact Number', '2345')]), OrderedDict([('Employee Number', '3'), ('Employee Name', 'Mr. C'), ('RMG SPOC', 'Z'), ('Total Experience (yrs)', '1.86'), ('Days Unallocated', '9'), ('Skill Details', 'C++, Manual Testing, Oracle'), ('Contact Number', '4567')]), OrderedDict([('Employee Number', '4'), ('Employee Name', 'Mr. D'), ('RMG SPOC', 'xyz'), ('Total Experience (yrs)', '7.68'), ('Days Unallocated', '23'), ('Skill Details', 'Manual Testing, SQL, HCM'), ('Contact Number', '789')])]


data = CaseInsensitiveDict(data[0])

print(data['EmplOYee NAME'])
# should print 'Ms. A'
print(data['Employee NAME'])
# should print 'Ms. A'
print(data['EmploYee NAME'])
# should print 'Ms. A'
print(data['EmployeE NAME'])
# should print 'Ms. A'
print(data['Employee Name'])
# should print 'Ms. A'

This is working for the case sensitive values. I am getting a warning when this is used - DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working — Raj, Feb 27 '19 at 08:23

score 0 · Answer 2 · answered Feb 25 '19 at 19:17

0

@C.Nivs is correct, here is a snippet showing how it works given your examples.

options = ('Employee Number','EMPLOYEE NUMBER','employee number')

for option in options:
  assert option.strip().lower() == "employee number"
  print("true")

answered Feb 25 '19 at 19:17

Collin Phillips

196
7

score 0 · Answer 3 · answered Feb 25 '19 at 19:25

What about normalizing the keys in your input first? - Maybe something like the following will do:

normalized_data = [{key.lower().replace(' ', '_'): val for key, val in datum.items()} for datum in data]

For the sample data, you will get:

[{'employee_number': '1', 'employee_name': 'Ms. A', 'rmg_spoc': 'X', 'total_experience_(yrs)': '3.06', 'days_unallocated': '18', 'skill_details': 'Manual testing', 'contact_number': '1234'},
 {'employee_number': '2', 'employee_name': 'Mr. B', 'rmg_spoc': 'Y', 'total_experience_(yrs)': '2.51', 'days_unallocated': '28', 'skill_details': 'Manual Testing', 'contact_number': '2345'},
 {'employee_number': '3', 'employee_name': 'Mr. C', 'rmg_spoc': 'Z', 'total_experience_(yrs)': '1.86', 'days_unallocated': '9', 'skill_details': 'C++, Manual Testing, Oracle', 'contact_number': '4567'},
 {'employee_number': '4', 'employee_name': 'Mr. D', 'rmg_spoc': 'xyz', 'total_experience_(yrs)': '7.68', 'days_unallocated': '23', 'skill_details': 'Manual Testing, SQL, HCM', 'contact_number': '789'}]

Get the key string matching a format from a dict - python

3 Answers3