I am attempting to inspect a PowerBI .pbix
file using python's zipfile
library.
When unzipping the .pbix
archive, I get the following structure:
DataMashup
DataModel
DiagramLayout
Metadata
Report
ReporLayout
ReporStaticResources
ReporStaticResourceSharedResources
ReporStaticResourceSharedResourceBaseThemes
ReporStaticResourceSharedResourceBaseThemeCY18SU07.json
SecurityBindings
Settings
Version
[Content_Types].xml
It appears that the DataMashup
file in the .pbix
archive is some sort of off-brand archive of a directory.
The DataMashup
object does not appear to be compressed, as I can easily read xml
data when printing the object in the python interpreter.
Using 7zip
I am able to access everything within:
DataMashup/
Config/
Package.xml
Formulas/
Section1.m # m and/or dax looking stuff
[Content_Types].xml
How can I discover the format of the DataMashup
archive-within-an-archive?
One clue is in the binary data at the top of the DataMashup
object: \x00\x00\x00\x00\x07\x05\x00\x00PK
which may indicate pkzip.
Another clue may be this output when attempting to use unzip
on the DataMashup
file:
$ unzip DataMashup
Archive: DataMashup
warning [DataMashup]: 6215 extra bytes at beginning or within zipfile
I was able to uncompress the DataMashup
directory on linux using 7za
:
WARNINGS:
There are data after the end of archive
--
Path = DataMashup
Type = zip
WARNINGS:
There are data after the end of archive
Offset = 8
Physical Size = 1303
Tail Size = 5148
Everything is Ok
Archives with Warnings: 1
Warnings: 1
Files: 3
Size: 2040
Compressed: 6459
Despite the warnings, the files appear okay. Unfortunately, this does not help me on windows.