What is the optimal Python data structure and how to export it to Excel

Question

I am trying to analyze mechanical computation results using Python and export them to Excel automatically. My raw data consists of various types of computation (each stored in a different text file). For each type, I have various load cases (loading conditions). For each load case, I output the results are various datapoints. Finally, at each datapoint, I need to store one vector result (with 3 components). So basically, the general structure is:

computation --> load --> datapoint --> component = value

After searching for similar questions, for instance:

and many others, it looks like the pandas library of Python seems the most appropriate, especially the dataframe and its ability to generate pivot_table or crosstab.

So far, I created a list of dictionary similar to this:

dd=[]
files = ["computation1.out","computation2.out"]
for files in files:
    filename=file.split(".")[0]
    with open(file,"r") as f:
        for row in f:
           ...
           #process file and extract
           #  loadnumber
           #  datapointnb
           #  component number icomp
           #  value
           # Creating a new Python dictionary
           mydict = {'Computation': filename,
                     'Load Case': RefValues[iref],
                     'Datapoint': EntValues[ient],
                     'Compo': icomp,
                     'Value': float(row)}
           # Append it to the list
           dd.append(mydict)
print(json.dumps(dd, sort_keys=True, indent=4))
df = pd.DataFrame.from_dict(dd)
print(df)
df.to_excel("myoutout.xlsx", sheet_name='Data', float_format="%.4f", merge_cells=False)

This generates something like this:

[
    {
        "Component": 1,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": 0.0004761905
    },
    {
        "Component": 2,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": -9.333414e-18
    },
    {
        "Component": 3,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": 0.0
    },
    {
        "Component": 1,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 84,
        "Value": 0.0009523809
    },
    ...
]
     Computation  LoadCase  Datapoint  Component         Value
0   computation1       1.0         57          1  4.761905e-04
1   computation1       1.0         57          2 -9.333414e-18
2   computation1       1.0         57          3  0.000000e+00
3   computation1       1.0         84          1  9.523809e-04
...

The conversion from the list of dictionaries to a dataframe and the export to Excel went well but I can't seem to create automatically pivot table or restructure the data in lines and columns:

           |computation1                                   |computation2...
           |-----------------------------------------------|---
           |loadcase1            |loadcase2            |...|loadcase1...
           |compo1 compo2 compo3 |compo1 compo2 compo3 |...|compo1...
-----------------------------------------------------------|---
datapoint1 |val1   val2   val3   |val4   val5   val6   |...|...
...        |...                                            |...
datapoint3 |val7   ...                                     |...
-----------------------------------------------------------|---

I have two questions:

Would this data structure (list of dictionaries) be the most efficient once the amount of data gets bigger and if not, what would you recommend? I also investigated nested dictionaries but seemed more complicated to export to Excel.
How can I restructure my dataframe to export to Excel using the appropriate format?

Thank you for you help, Christophe

Thank you @MattDMo. I tried this but the Excel sheet contains raw data `defaultdict(.... at 0x0000015CC54CB130>, {'Node % 57': defaultdict(,...`. I was hoping for a more structured way where my loads and datapoints would be in rows and columns. Should I rework the dataframe or use other options in the `to_excel` function? — tristof, Aug 23 '22 at 11:47
Aside from `df.to_excel()`, there is also a `pd.ExcelWriter` class that you can use as-is with arguments, or subclass and customize as needed. You could also import `openpyxl` and use its functionality directly, or as part of your `ExcelWriter` subclass. — MattDMo, Aug 23 '22 at 13:43
Thanks. I managed to restructure my data bit using a list of dictionaries instead of nested ditionaries and used df.to_excel to dump it to Excel. I edited my question to reflect this. Unfortunately, I still question the efficiency of the structure and the export to Excel is a flat list of data whereas I'd like some fields to be columns and others to be lines. Thanks! — tristof, Aug 23 '22 at 14:05

score 1 · Accepted Answer · answered Aug 23 '22 at 15:15

Finally, converting my list of dictionaries to pandas.DataFrame and using the appropriate arguments to pandas.pivot_table and exporting to Excel using pandas.pivot_table.to_excel did the trick.

Full solution was those lines:

df = pandas.DataFrame.from_dict(dd)
mypivot=pandas.pivot_table(df, index=EntTypeLabel, columns=['Computation',RefTypeLabel,'Compo'], values='Value')
print_dbg(1,mypivot)
myExcelfile="samres.xlsx"
mypivot.to_excel(myExcelfile, sheet_name='Data', float_format="%.4f", merge_cells=False)

This finally generated the desired output:

Computation   COMPUTATION1          COMPUTATION2         ...
Load Case   Load Case-1.00        Load Case-1.00         ...
Compo               Comp-1 Comp-2         Comp-1 Comp-2  ...
Node %                                                   ...
5559                  1.13  -1.08           0.80  -0.63  ...
5561                  0.77  -1.17           0.54  -0.68  ...
17789                 1.00  -1.17           0.66  -0.62  ...
17791                 0.69  -1.16           0.49  -0.60  ...
56690                 0.81   0.03           0.57   0.44  ...
56812                 1.01  -0.04           0.70   0.33  ...

Thanks to @MattDMo for the suggestion to use to_excel.

What is the optimal Python data structure and how to export it to Excel

1 Answers1