How can one sort a list of dictionaries into an excel sheet such that each unique key and one corresponding key are placed into columns?

Question

I have a list of dictionaries organized as such:

listofdictionaries = [{'key1': (A, B, C), 'key2':[1, 2, 3]}, 
                      {'key1': (AA, BB, CC), 'key2':[1, 2, 3]}, 
                      {'key1': (AAA, BBB, CCC), 'key2':[4, 5, 6]}]

This list's first and second items have an equivalent value for key2. The third item has a different value for key2. Still using Python, I want the columns organized as such:

Group 1	Group 1 Items	Group 2	Group 2 Items
[1, 2, 3]	(A, B, C)	[4, 5, 6]	(AAA, BBB, CCC)
	(AA, BB, CC)

In addition I would like the output to be a .csv file.

The elements inside the lists in `key2`'s values are unique or can they be repeated? Meaning, could you have {'key2': [1,1,3]}? — ccolin, Feb 05 '23 at 04:25

Driftr95 · Accepted Answer · 2023-02-05T21:06:52.560

With pandas, you can use something like this function

def groupItems(dictList, itemsFrom, groupBy, saveTo=None):
    ik, gk, colsDict = itemsFrom, groupBy, {}
    groups = {str(d.get(gk)): d.get(gk) for d in dictList} 
    itemsList = [ [d.get(ik) for d in dictList if str(d.get(gk))==g] 
                  for g in groups   ]

    maxRows = max(len(li) for li in itemsList) if groups else 0
    for gi, (g, li) in enumerate(zip(groups.keys(), itemsList), 1):
        colsDict[f'Group {gi}'] = [groups[g]] + [None]*(maxRows-1)
        colsDict[f'Group {gi} Items'] = li + [None]*(maxRows-len(li))
    
    rdf = pandas.DataFrame(colsDict)
    if saveTo and isinstance(saveTo, str):
        print('Saving', maxRows, 'rows for', len(groups),'groups to', saveTo)
        rdf.to_csv(saveTo, index=False)
    return rdf

Calling groupItems(listofdictionaries, 'key1', 'key2', 'x.csv') will save the DataFrame from the screenshot below to x.csv.

_{To demonstrate that the brackets were not lost:}

You could also get it in this format if you change the function to

def groupItems(dictList, itemsFrom, groupBy, saveTo=None):
    ik, gk = itemsFrom, groupBy
    groups = {str(d.get(gk)): d.get(gk) for d in dictList} 
    itemsList = [ [d.get(ik) for d in dictList if str(d.get(gk))==g] 
                  for g in groups   ]

    maxRows = max(len(li) for li in itemsList) if groups else 0
    
    colsDict = {f'Group {gi}': [groups[g]] + [None] + (
        [f'Group {gi} Items'] + li + [None]*(maxRows-len(li))
    ) for gi, (g, li) in enumerate(zip(groups.keys(), itemsList), 1)}
    rdf = pandas.DataFrame(colsDict) 

    if saveTo and isinstance(saveTo, str):
        print('Saving', maxRows+4, 'rows for', len(groups),'groups to', saveTo)
        rdf.to_csv(saveTo, index=False)
    return rdf

NOTE: Saving as csv will stringify all non-numeric cells. If you want to preserve nested structure, I suggest saving as JSON instead.

Quick note, as per question expected output, the brackets in **GroupX** columns are part of the output, so instead of `1,2,3` it should be `[1,2,3]` for example. — David Leal, Feb 05 '23 at 17:18
@DavidLeal Thanks [I hadn't noticed before] but the brackets disappeared just because of how DataFrames get displayed - if you print the markdown or view the CSV, the brackets are still there — Driftr95, Feb 05 '23 at 21:08

David Leal · Answer 2 · 2023-02-05T17:12:51.720

Here a solution using an Excel formula. Probably using FILTERXML there is shorter way, but it is not available for Excel Web, which is the version I use. In cell B3, you can use the following formula:

=LET(header, {"Group","Name"}, clean, SUBSTITUTE(TEXTSPLIT(B1,
 {"[{","'key1': ","'key2':","}]","}, {"},,1),"),",")"),
 split, WRAPROWS(clean,2), gps,INDEX(split,,2),names, INDEX(split,,1),
 gpsUx, UNIQUE(gps), out, DROP(REDUCE("", gpsUx, LAMBDA(ac,x,
  HSTACK(ac, VSTACK(header, HSTACK(x, FILTER(names, gps=x)))))),,1),
 IFERROR(out,""))

Here is the output:

An alternative solution is using a recursive function to do all the replacement. Please check my answer to the question: Is it possible to convert words to numbers in a string without VBA? I use the same function here MultiReplace:

= LAMBDA(text, old, new, IF(MIN(LEN(old))=0, text, 
  MultiReplace(SUBSTITUTE(text, INDEX(old,1), INDEX(new,1)), 
    IFERROR(DROP(old,1),""), IFERROR(DROP(new,1),""))))

where old and new are nx1 arrays, where n is the number of substitutions.

The above function needs to be defined in the Name Manager since it is recursive. Now we are going to use in formula to provide the output of the question in cell B3:

=LET(header, {"Group", "Name"}, clean, MultiReplace(B1, 
  {"{'key1': " ; ", 'key2':" ; "}" ; "[(";"]]";"], ("}, 
  {"" ; "-" ; "" ; "(" ; "]" ; "] & ("}), 
 split, TEXTSPLIT(@clean,"-"," & "), gps,INDEX(split,,2),names, INDEX(split,,1), 
 gpsUx, UNIQUE(gps), out, DROP(REDUCE("", gpsUx, LAMBDA(ac,x, 
 HSTACK(ac, VSTACK(header, HSTACK(x, FILTER(names, gps=x)))))),,1), 
 IFERROR(out,""))

You can check the output of each name variable: clean, split, to see the intermediate results. Once we have the information in array format (split), then we apply DROP/REDUCE/HSTACK pattern. Check my answer to the question: how to transform a table in Excel from vertical to horizontal but with different length for more details.

Both solutions work for multiple groups, not just for two groups as in the input sample. For each unique key1 group it generates the Group and its corresponding Name columns and concatenate the result horizontally via HSTACK.

How can one sort a list of dictionaries into an excel sheet such that each unique key and one corresponding key are placed into columns?

2 Answers2

NOTE: Saving as csv will stringify all non-numeric cells. If you want to preserve nested structure, I suggest saving as JSON instead.