0

This is almost the same from my yesterday's question. But I took it for granted to use a unique value list to create the nested dict & list structure. But then, I came to the question of how to build this dict & list structure (refer as data structure) row by row from the excel data.

The excel files (multiple files in a folder) all look like the following:

Category    Subcategory    Name
Main Dish   Noodle         Tomato Noodle
Main Dish   Stir Fry       Chicken Rice
Main Dish   Soup           Beef Goulash
Drink       Wine           Bordeaux
Drink       Softdrink      Cola

My desired structure of dict & list structure is:

data = [0:{'data':0, 'Category':[
                             {'name':'Main Dish', 'Subcategory':[
                                                   {'name':'Noodle', 'key':0, 'data':['key':1, 'title':'Tomato Noodle']},
                                                   {'name':'Stir Fry', 'key':1, 'data':['key':2, 'title':'Chicken Rice']},
                                                   {'name':'Soup', 'key':2, 'data':['key':3, 'title':'Beef Goulash']}]},
                              {'name':'Drink', 'Subcategory':[
                                                   {'name':'Wine', 'key':0, 'data':['key':1, 'title':'Bordeaux']},
                                                   {'name':'Softdrink', 'key':1, 'data':['key':2, 'title':'cola'}]}]},
    1:{'data':1, 'Category':.........#Same structure as dataset 0}]

So, for each excel file, it is fine, just loop through and set {'data':0, 'Category':[]}, {'data':1, 'Category':[]} and so on. The key is, for each Category and Subcategory values, Main Dish has three entries in excel, but only needs 1 in the data structure, and Drink has two entries in excel, but only 1 in the data structure. For each subcategory nested in the category list, they follow the same rule, only unique values should be nested to category. Then, each corresponding Name of dishes, they go into the data structure depending on their category and subcategory.

The issue is, I cannot find a better way to convert the data to this data structure. Plus, there are other columns after the Name column. So it is kind of sophisticated. I was thinking to first extract the unique values from the entire column of category and subcategory, this simplifies the process, but leads to problems when filling in the corresponding Name values. If I am doing this from a row by row approach, then designing a if subcategory exist or category exit test to keep unique values are somehow difficult based on my current programming skills...

Therefore, what would be the best approach to convert this excel file into such a data structure? Thank you very much.

ACuriousCat
  • 1,003
  • 1
  • 8
  • 21
  • Have you tried do it yourself? Where is the failing code? – NutCracker Sep 26 '18 at 06:58
  • Yup, when testing whether one category already exists, I always encounter list index out of range, the test itself becomes quite complicated. Then there are way many columns that need to do this test...so I began to think if there is a better approach – ACuriousCat Sep 26 '18 at 07:04

1 Answers1

1

One way could be to read the excelfile into a dataframe using pandas, and then build on this excellent answer Pandas convert DataFrame to Nested Json

import pandas as pd
excel_file = 'path-to-your-excel.xls'

def fdrec(df):
    drec = dict()
    ncols = df.values.shape[1]
    for line in df.values:
        d = drec
        for j, col in enumerate(line[:-1]):
            if not col in d.keys():
                if j != ncols-2:
                    d[col] = {}
                    d = d[col]
                else:
                    d[col] = line[-1]
            else:
                if j!= ncols-2:
                    d = d[col]
    return drec

df = pd.read_excel(excel_file)

print(fdrec(df))
Philip
  • 944
  • 11
  • 26