0

I have a list of dictionaries organized as such:

listofdictionaries = [{'key1': (A, B, C), 'key2':[1, 2, 3]}, 
                      {'key1': (AA, BB, CC), 'key2':[1, 2, 3]}, 
                      {'key1': (AAA, BBB, CCC), 'key2':[4, 5, 6]}]

This list's first and second items have an equivalent value for key2. The third item has a different value for key2. Still using Python, I want the columns organized as such:

Group 1 Group 1 Items Group 2 Group 2 Items
[1, 2, 3] (A, B, C) [4, 5, 6] (AAA, BBB, CCC)
(AA, BB, CC)

In addition I would like the output to be a .csv file.

2 Answers2

2

With pandas, you can use something like this function

def groupItems(dictList, itemsFrom, groupBy, saveTo=None):
    ik, gk, colsDict = itemsFrom, groupBy, {}
    groups = {str(d.get(gk)): d.get(gk) for d in dictList} 
    itemsList = [ [d.get(ik) for d in dictList if str(d.get(gk))==g] 
                  for g in groups   ]

    maxRows = max(len(li) for li in itemsList) if groups else 0
    for gi, (g, li) in enumerate(zip(groups.keys(), itemsList), 1):
        colsDict[f'Group {gi}'] = [groups[g]] + [None]*(maxRows-1)
        colsDict[f'Group {gi} Items'] = li + [None]*(maxRows-len(li))
    
    rdf = pandas.DataFrame(colsDict)
    if saveTo and isinstance(saveTo, str):
        print('Saving', maxRows, 'rows for', len(groups),'groups to', saveTo)
        rdf.to_csv(saveTo, index=False)
    return rdf

Calling groupItems(listofdictionaries, 'key1', 'key2', 'x.csv') will save the DataFrame from the screenshot below to x.csv.

dfop1 To demonstrate that the brackets were not lost: opmd1


You could also get it in this format if you change the function to

def groupItems(dictList, itemsFrom, groupBy, saveTo=None):
    ik, gk = itemsFrom, groupBy
    groups = {str(d.get(gk)): d.get(gk) for d in dictList} 
    itemsList = [ [d.get(ik) for d in dictList if str(d.get(gk))==g] 
                  for g in groups   ]

    maxRows = max(len(li) for li in itemsList) if groups else 0
    
    colsDict = {f'Group {gi}': [groups[g]] + [None] + (
        [f'Group {gi} Items'] + li + [None]*(maxRows-len(li))
    ) for gi, (g, li) in enumerate(zip(groups.keys(), itemsList), 1)}
    rdf = pandas.DataFrame(colsDict) 

    if saveTo and isinstance(saveTo, str):
        print('Saving', maxRows+4, 'rows for', len(groups),'groups to', saveTo)
        rdf.to_csv(saveTo, index=False)
    return rdf

NOTE: Saving as csv will stringify all non-numeric cells. If you want to preserve nested structure, I suggest saving as JSON instead.
Driftr95
  • 4,572
  • 2
  • 9
  • 21
  • 1
    Quick note, as per question expected output, the brackets in **GroupX** columns are part of the output, so instead of `1,2,3` it should be `[1,2,3]` for example. – David Leal Feb 05 '23 at 17:18
  • 1
    @DavidLeal Thanks [I hadn't noticed before] but the brackets disappeared just because of how DataFrames get displayed - if you print the markdown or view the CSV, the brackets are still there – Driftr95 Feb 05 '23 at 21:08
0

Here a solution using an Excel formula. Probably using FILTERXML there is shorter way, but it is not available for Excel Web, which is the version I use. In cell B3, you can use the following formula:

=LET(header, {"Group","Name"}, clean, SUBSTITUTE(TEXTSPLIT(B1,
 {"[{","'key1': ","'key2':","}]","}, {"},,1),"),",")"),
 split, WRAPROWS(clean,2), gps,INDEX(split,,2),names, INDEX(split,,1),
 gpsUx, UNIQUE(gps), out, DROP(REDUCE("", gpsUx, LAMBDA(ac,x,
  HSTACK(ac, VSTACK(header, HSTACK(x, FILTER(names, gps=x)))))),,1),
 IFERROR(out,""))

Here is the output: excel output

An alternative solution is using a recursive function to do all the replacement. Please check my answer to the question: Is it possible to convert words to numbers in a string without VBA? I use the same function here MultiReplace:

= LAMBDA(text, old, new, IF(MIN(LEN(old))=0, text, 
  MultiReplace(SUBSTITUTE(text, INDEX(old,1), INDEX(new,1)), 
    IFERROR(DROP(old,1),""), IFERROR(DROP(new,1),""))))

where old and new are nx1 arrays, where n is the number of substitutions.

The above function needs to be defined in the Name Manager since it is recursive. Now we are going to use in formula to provide the output of the question in cell B3:

=LET(header, {"Group", "Name"}, clean, MultiReplace(B1, 
  {"{'key1': " ; ", 'key2':" ; "}" ; "[(";"]]";"], ("}, 
  {"" ; "-" ; "" ; "(" ; "]" ; "] & ("}), 
 split, TEXTSPLIT(@clean,"-"," & "), gps,INDEX(split,,2),names, INDEX(split,,1), 
 gpsUx, UNIQUE(gps), out, DROP(REDUCE("", gpsUx, LAMBDA(ac,x, 
 HSTACK(ac, VSTACK(header, HSTACK(x, FILTER(names, gps=x)))))),,1), 
 IFERROR(out,""))

You can check the output of each name variable: clean, split, to see the intermediate results. Once we have the information in array format (split), then we apply DROP/REDUCE/HSTACK pattern. Check my answer to the question: how to transform a table in Excel from vertical to horizontal but with different length for more details.

Both solutions work for multiple groups, not just for two groups as in the input sample. For each unique key1 group it generates the Group and its corresponding Name columns and concatenate the result horizontally via HSTACK.

David Leal
  • 6,373
  • 4
  • 29
  • 56