0

I have got some data which I have read into dictionary of dictionaries

EDIT : Posting original data format

Original Data is one excel files for each user every month

Alpha - Jan 2018 .. following format

             Score
English        70
Social Science 80
Maths          90
History        45
Science        50

I read all these excels into python and get them into dictionaries as mentioned below. Some students may miss some exams hence for those months their data will be missing. Hence variation will be complete month data missing for few students.

{alpha: {u'Jan-2018': {'Eng': '70', 'math': '90', 'sci': '50'}, u'feb-2018': {'Eng': '75', 'math': '85', 'sci': '60'}, u'mar-2018': {'Eng': '60', 'math': '92', 'sci': '40'}}

{beta : {u'Jan-2018': {'Eng': '30', 'math': '50', 'sci': '40'}, u'feb-2018': {'Eng': '55', 'math': '45', 'sci': '70'}, u'may-2018': {'Eng': '50', 'math': '52', 'sci': '45'}}

{gamma : {u'Jan-2018': {'Eng': '50', 'math': '50', 'sci': '40'}, u'feb-2018': {'Eng': '55', 'math': '75', 'sci': '40'}, u'may-2018': {'Eng': '56', 'math': '59', 'sci': '35'}}

I want to get these on Excel in following format. On sheet 1 it should publish only Eng data for different months and on second sheet math data and third sci data. For whichever month for someone data is missing, that should be left blank or maybe 0

Sheet1(Eng):
        Jan-2018        Feb-2018        Mar-2018        May-2018
alpha       70          75                60            0   
beta        30          55                 0            50
gamma       50          55                 0            56

similarly for other two sheets.

I have tried following code, however there are two issues with it:

  1. It doesn't consider the missing months, and prints sequentially
  2. It doesn't print the month name on top of the every column

    List1 contains dict of dicts mentioned above

    alleng    = {}
    allmath   = {}
    allsci    = {}
    
    for i in list1:
        englist  = []   
        mathlist = []
        scilist  = []
    
        for m in list1[i]:
            for h in list1[i][m]:
                value1 = list1[i][m][h]
                if h == 'Eng':
                    englist.append(value1)
                if h == 'Math':
                    mathlist.append(value1)
                if h == 'Sci':
                    scilist.append(value1)
    
        alleng[i]     = englist 
        allmath[i]    = mathlist
        allsci[i]     = scilist 
    
    writer = ExcelWriter('final-sheet.xlsx')
    
    frame = pd.DataFrame.from_dict(allsci, orient='index')
    frame = frame.transpose()
    frame = frame.transpose()
    frame.to_excel(writer , sheet_name = 'Sci')
    
    frame1 = pd.DataFrame.from_dict(alleng, orient='index')
    frame1 = frame1.transpose()
    frame1 = frame1.transpose()
    frame1.to_excel(writer , sheet_name = 'Eng')
    
    frame2 = pd.DataFrame.from_dict(allmath, orient='index')
    frame2 = frame2.transpose()
    frame2 = frame2.transpose()
    frame2.to_excel(writer , sheet_name = 'Math')
    

I also tried using following solution, however it didn't help:

Dict of dicts of dicts to DataFrame

Zoro99
  • 185
  • 3
  • 13
  • I'm more interested in the original source of the data what it looks like, and why you're doing this in so many steps. There is probably a better way to handle this than converting to dictionaries and then converting back to Excel.and What is the source of the data? Also you mention missing months, etc; the sample data may be more helpful if it is more representative of possible variations that can occur in this dataset. – ashleedawg Oct 27 '18 at 12:18
  • I get multiple excel sheets for users like alpha, beta and gamma every month. I read their data from those excels and convert to dictionary (list1). Then have to sort this data subject wise and that is where I am struggling. The code snippet I pasted is surely a longer way and there should be something is Pandas to convert to a dataframe in required format. found this on one of the solutions however it gives everything in a single sheet itself, though row wise it sorts the data --- pd.concat({k: pd.DataFrame(v) for k, v in list1.items()}) – Zoro99 Oct 27 '18 at 12:26

1 Answers1

0

I tried following code to convert dicts to dataframe and it helps

df1=pd.DataFrame(list1).stack().apply(pd.Series).unstack()

It will give the data on a single sheet itself in following format:

            Jan-2018   feb-2018   mar-2018  april-2018 
Eng  Alpha    70            75          60          0
     Beta     30            55          0           50
     Gamma    50            55          0           56

Math Alpha    90            85          92          0
     Beta     50            45          0           52
     Gamma    50            75          0           59
Zoro99
  • 185
  • 3
  • 13