1

I have a list that has multiple dictionary entries in it but I want to turn that list of dictionaries into a double nested dictionary. I would also like to store all entries and not overwrite anything.

Sample df:

data= {'First Name': ['Sally', 'Bob', 'Sue', 'Tom', 'Will'],
     'Last Name': ['William', '', 'Wright', 'Smith','Thomas'],
     'Indsutry': ['Automotive','Gas', 'Healthcare', 'Other', 'Biotech / Pharma'],
     'SME Vertical': ['Education', 'hotels', '', 'project management and design',''],
     'System Type': ['Access','Access','video Systems','Access','Access'],
     'Account Type': ['Commercial', '','Reseller','','Small']}

df=pd.DataFrame(data)
df1= df[["Industry",'System Type','Account Type', 'SME Vertical']]
errors= {}
filename= os.path.basename(r'sample\path\to\file.csv')

valid= {'Industry': ['Automotive', 'Banking / Finance','Biotech / Pharma','Commercial Buildings','Construction / Distribution',
                  'Consumer Products','Education','Education - K-12','Education - University / Higher','Entertainment / Media','Financial',
                  'Food & Beverage','Gas','Government','Government - Federal','Government - State / Local','Healthcare','High Security',
                  'Hospitality / Entertainment','Manufacturing / Communications','Other','Petrochem / Energy',
                  'Property Management / Real Estate','Public Facility / Non-Profit','Residential','Restaurant','Retail','Services - B2B',
                  'Technology','Telecom / Utilities','Transportation','Utilities','Food Retail','Specialized Retail','IT','Corrections',
                  'Core Commercial (SME)'],
        'SME Vertical': ['Agriculture, Food and Manufacturing','Architectural services','Arts, entertainment and recreation','Automobile',
                'Chemistry / Pharmacy','Construction','Education','Hotels','Offices','Other Industries','Other Services',
                'Project management and design','Real Estate and promotion','Restaurants, Café and Bars',
                'Energy, Infrastructure, Environment and Mining','Financial and Insurance Services',
                'Human health and social work activities','Professional, scientific, technical and communication activities',
                'Public administration and defence, compulsory social security','Retail/Wholesale','Transport, Logistics and Storage'],
        'System Type': ['Access','Access Control','Alarm Systems','Asset Tracking','Banking','Commander','EAS','Financial products','Fire',
                    'Fire Alarm','Integrated Solution','Intercom','Intercom systems','Intrusion - Traditional','Locking devices & Systems',
                    'Locks & Safes','Paging','Personal Safety','Retail & EAS Products','SaaS','SATS','Services',
                    'Sonitrol Integrated Solution','Sonitrol - Integrated Solution','Sonitrol - Managed Access',
                    'Sonitrol - Verified Audio Intrusion','Time & Attendance','TV-Distribution','Unknown','Video','Video Systems'],
        'Account Type': ['Commercial','International','National','Regional','Reseller','Residential','Small']}

mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()

for r, v in df1.mask(mask|df1.eq(' ')).stack().iteritems():
    errors[filename]={
    "row": r[0],
    "column": r[1],
    "message": v + " is invalid"
}

output:

[{'row': 1, 'column': 'Industry', 'message': 'gas is invalid'}, {'row': 1, 'column': 'SME Vertical', 'message': 'hotels is invalid'}, {'row': 2, 'column': 'Industry', 'message': 'healthcare is invalid'}, {'row': 3, 'column': 'Industry', 'message': 'other is invalid'}, {'row': 3, 'column': 'SME Vertical', 'message': 'project management and design is invalid'}, {'row': 4, 'column': 'Account Type', 'message': 'small is invalid'}]

ideal output:

errors={file name:{'row': 1, 'column': 'Industry', 'message': 'gas is invalid'}, ... etc

I have tried:

errors = {{}}

mask = df1.apply(lambda c: c.isin(valid[c.name]))
df1.mask(mask|df1.eq(' ')).stack()

for r, v in df1.mask(mask|df1.eq(' ')).stack().iteritems():
    errors.update({
    "row": r[0],
    "column": r[1],
    "message": v + " is invalid"
})

but I get an error

any ideas or help are appreciated

  • In your _ideal output_, what's `file name`? Do you mean `'file name'` (a string)? – Ignatius Reilly Aug 18 '22 at 15:57
  • yes I mean file name as a variable so it will change depending on what the file is called –  Aug 18 '22 at 15:58
  • 1
    And which would be the key for the second dictionary in the original list: `{'row': 1, 'column': 'SME Vertical', 'message': 'hotels is invalid'}`? That one is coming from the same file as the previos one, isn't it? – Ignatius Reilly Aug 18 '22 at 16:01

1 Answers1

0

What you have to do is create a new dictionary within errors. I cannot see in your example where you get the file name (or key for the main dictionary) so I cannot give your a fully working example but it should be something like this:

errors = {}

filename = 'SUPER AWESOME FILE' #  or where ever you get the filename key you wanted

errors[filename] = {}

for err_i, (r, v) in enumerate(df1.mask(mask|df1.eq(' ')).stack().iteritems()):

    errors[filename][err_i] = {"row": r[0],
                               "column": r[1],
                               "message": v + " is invalid"}

Note: if filename is not a variable change it to a string as suggested in the comments by Ignatius Reilly

n4321d
  • 1,059
  • 2
  • 12
  • 31
  • This method works but it does not store all the values in the error dictionary, it only stores the most recent error found –  Aug 23 '22 at 17:14
  • that is another issue if you want to store all errors you have to make lists in your dictionary and append to it, or add keys for each errors with nested dictionaries such as in the example. in that case you can make the errors[filename] dict before your for loop with {"row": [], "column":[], "message": []}, then make your for loop with errors[filename]['row'].append(r[0]) errors[filename]['column'].append(r1[) etc... If you want me to edit my answer please edit your question so that it includes storing all errors. – n4321d Aug 23 '22 at 17:32
  • I edited my question, how would you make it a double nested dictionary with the keys for each error? –  Aug 23 '22 at 18:02
  • I editted the answer to be double nested but it is unclear where you get your errors from. If you want better help please add the code to make your dataframe and where you get your errors etc see: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples and https://stackoverflow.com/help/minimal-reproducible-example. I should be able to just copy and paste it to reproduce your question/issues – n4321d Aug 23 '22 at 18:10
  • I added code for the dataframe that you can copy and paste to see what I am working with. I also added the file name as a variable, but you can just use filename='test file' @n4321d –  Aug 23 '22 at 18:47
  • Thanks! I editted the answer so that is numbers each error. Hope that is what you are looking for. Also there is a typo in data= ... I assume "Indsutry" should be "Industry" – n4321d Aug 23 '22 at 19:11