Important note
contrib/masmx* directories were removed back in 2017 (at answer time I was working on the v1.2.11 .zip file which contains them), so everything below no longer applies (OOTB).
But their references were not removed (at least not from all places), so if they're enabled (from CMake), build will fail.
However, I submitted [GitHub]: madler/zlib - Re enable ASM speedups on Win, (rejected on 221007) so everything below does apply (again).
For possible ways to benefit from the patch (once / if it's accepted), check [SO]: How to change username of job in print queue using python & win32print (@CristiFati's answer) (at the end).
Applied the above patch to (current) master branch:
Used following script to go through all configurations, run and aggregate results.
code00.py:
#!/usr/bin/env python
import hashlib as hl
import os
import shutil
import subprocess as sp
import sys
import time
from pprint import pprint as pp
ARCHS = (
"pc064",
"pc032",
)
def print_file_data(file, level=0):
st = os.stat(file)
header = " " * level
print("{:s}File: {:s}\n{:s} Size: {:d}, CTime: {:s}".format(header, file, header, st.st_size, time.ctime(st.st_ctime)))
def main(*argv):
verbose = False
build_dir = os.getcwd() #os.path.dirname(os.path.abspath(__file__))
if argv:
file = argv[0]
if "-v" in argv:
verbose = True
else:
file = "file.onnx"
#file = "bigfile.txt"
#file = "enwik8"
file_test = file + ".test"
file_gz = file_test + ".gz"
shutil.copy(file, file_test)
md5_src = hl.md5(open(file_test, mode="rb").read()).hexdigest()
print_file_data(file_test)
data = {}
for arch in ARCHS:
if verbose:
print("Arch: {:s}".format(arch))
ad = {}
for typ in ("plain", "masm"):
if verbose:
print(" Type: {:s}".format(typ))
mg = os.path.join(build_dir, "_build", arch, typ, "minigzip.exe")
for level in (1, 5, 9):
shutil.copy(file, file_test)
if verbose:
print("\n Compress (level {:d})".format(level))
proc = sp.Popen([mg, "-{:d}".format(level), file_test])
time_start = time.time()
proc.communicate()
elapsed = time.time() - time_start
if verbose:
print(" Took {:.3f} seconds".format(elapsed))
ad.setdefault("level {:d}".format(level), {}).setdefault("inflate", {})[typ] = elapsed
if verbose:
print_file_data(file_gz, level=2)
if verbose:
print(" Decompress")
proc = sp.Popen([mg, "-d", file_gz])
time_start = time.time()
proc.communicate()
elapsed = time.time() - time_start
if verbose:
print(" Took {:.3f} seconds".format(elapsed))
ad.setdefault("level {:d}".format(level), {}).setdefault("deflate", {})[typ] = elapsed
if verbose:
print_file_data(file_test, level=2)
if hl.md5(open(file_test, mode="rb").read()).hexdigest() != md5_src:
print("!!! File hashes differ !!!")
data[arch] = ad
pp(data, indent=2, sort_dicts=False)
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
Output:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q029505121]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" ./code00.py file.onnx
Python 3.9.9 (tags/v3.9.'level 9':ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32
File: file.onnx.test
Size: 255890833, CTime: Sat Sep 3 02:03:05 2022
{ 'pc064': { 'level 1': { 'inflate': { 'plain': 12.552296161651611,
'masm': 11.09960412979126},
'deflate': { 'plain': 1.802419900894165,
'masm': 1.8380048274993896}},
'level 5': { 'inflate': { 'plain': 13.694978713989258,
'masm': 12.098156213760376},
'deflate': { 'plain': 1.756164312362671,
'masm': 1.7628483772277832}},
'level 9': { 'inflate': { 'plain': 13.969024419784546,
'masm': 12.125015497207642},
'deflate': { 'plain': 1.7450010776519775,
'masm': 1.756005048751831}}},
'pc032': { 'level 1': { 'inflate': { 'plain': 13.748999118804932,
'masm': 11.81002926826477},
'deflate': { 'plain': 1.9236936569213867,
'masm': 2.3493638038635254}},
'level 5': { 'inflate': { 'plain': 15.036035299301147,
'masm': 12.898797512054443},
'deflate': { 'plain': 1.8580067157745361,
'masm': 2.282176971435547}},
'level 9': { 'inflate': { 'plain': 15.134005308151245,
'masm': 12.89007306098938},
'deflate': { 'plain': 1.8709957599639893,
'masm': 2.2773334980010986}}}}
Done.
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q029505121]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" ./code00.py enwik8.txt
Python 3.9.9 (tags/v3.9.'level 9':ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32
File: enwik8.txt.test
Size: 100000000, CTime: Tue Sep 6 00:33:20 2022
{ 'pc064': { 'level 1': { 'inflate': { 'plain': 1.9976372718811035,
'masm': 1.9259986877441406},
'deflate': { 'plain': 0.7285704612731934,
'masm': 0.7076430320739746}},
'level 5': { 'inflate': { 'plain': 4.5627357959747314,
'masm': 4.003000020980835},
'deflate': { 'plain': 0.6933917999267578,
'masm': 0.6450159549713135}},
'level 9': { 'inflate': { 'plain': 8.079626083374023,
'masm': 6.618978977203369},
'deflate': { 'plain': 0.7049713134765625,
'masm': 0.6319396495819092}}},
'pc032': { 'level 1': { 'inflate': { 'plain': 2.1649997234344482,
'masm': 2.1139981746673584},
'deflate': { 'plain': 0.7583539485931396,
'masm': 0.8125534057617188}},
'level 5': { 'inflate': { 'plain': 5.03799843788147,
'masm': 4.2109808921813965},
'deflate': { 'plain': 0.8489999771118164,
'masm': 0.6870477199554443}},
'level 9': { 'inflate': { 'plain': 7.9073097705841064,
'masm': 7.512087821960449},
'deflate': { 'plain': 0.7378275394439697,
'masm': 0.7450006008148193}}}}
Done.
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q029505121]> cd dbg
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q029505121\dbg]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" ../code00.py ../file.onnx
Python 3.9.9 (tags/v3.9.'level 9':ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32
File: ../file.onnx.test
Size: 255890833, CTime: Tue Sep 6 00:37:51 2022
{ 'pc064': { 'level 1': { 'inflate': { 'plain': 25.337001085281372,
'masm': 22.544013023376465},
'deflate': { 'plain': 3.915001153945923,
'masm': 2.3709957599639893}},
'level 5': { 'inflate': { 'plain': 28.28699827194214,
'masm': 24.88018822669983},
'deflate': { 'plain': 3.846531867980957,
'masm': 2.2239699363708496}},
'level 9': { 'inflate': { 'plain': 28.81813645362854,
'masm': 23.6450355052948},
'deflate': { 'plain': 3.9910058975219727,
'masm': 2.302088737487793}}},
'pc032': { 'level 1': { 'inflate': { 'plain': 24.923137664794922,
'masm': 20.991183042526245},
'deflate': { 'plain': 3.7310261726379395,
'masm': 2.6056015491485596}},
'level 5': { 'inflate': { 'plain': 27.760021209716797,
'masm': 22.589048624038696},
'deflate': { 'plain': 3.566000461578369,
'masm': 2.55342698097229}},
'level 9': { 'inflate': { 'plain': 28.245535135269165,
'masm': 22.70799994468689},
'deflate': { 'plain': 3.553999423980713,
'masm': 2.5700416564941406}}}}
Done.
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q029505121\dbg]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" ../code00.py ../enwik8.txt
Python 3.9.9 (tags/v3.9.'level 9':ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32
File: ../enwik8.txt.test
Size: 100000000, CTime: Tue Sep 6 00:39:59 2022
{ 'pc064': { 'level 1': { 'inflate': { 'plain': 4.711355447769165,
'masm': 4.008531808853149},
'deflate': { 'plain': 1.4210000038146973,
'masm': 0.9230430126190186}},
'level 5': { 'inflate': { 'plain': 8.914000034332275,
'masm': 6.604032516479492},
'deflate': { 'plain': 1.3359959125518799,
'masm': 0.8460018634796143}},
'level 9': { 'inflate': { 'plain': 13.503999948501587,
'masm': 9.228030920028687},
'deflate': { 'plain': 1.328040599822998,
'masm': 0.8240146636962891}}},
'pc032': { 'level 1': { 'inflate': { 'plain': 4.435391664505005,
'masm': 3.933983087539673},
'deflate': { 'plain': 1.3369977474212646,
'masm': 0.9399752616882324}},
'level 5': { 'inflate': { 'plain': 8.48900055885315,
'masm': 6.599977731704712},
'deflate': { 'plain': 1.2629964351654053,
'masm': 0.8410165309906006}},
'level 9': { 'inflate': { 'plain': 12.677618026733398,
'masm': 9.191060781478882},
'deflate': { 'plain': 1.251995325088501,
'masm': 0.8130028247833252}}}}
Done.
As seen, for release builds (that most people are interested in), the speed gain using the speedups is not that great (sometimes it's even slower than the C code). This (combined with the lack of maintenance) was one of the reasons for disabling them.
Original answer
While playing with assembler speedups, I discovered that the issue is reproducible on the (currently) latest version: v1.2.11 ([GitHub]: madler/zlib - ZLIB DATA COMPRESSION LIBRARY).
This error happens only (obviously, OS: Win, build toolchain: VStudio and assembly speedups enabled) for:
Below is a "callstack" (top -> down is equivalent to outer -> inner) during decompression.
Normal case:
inflate (inflate.c)
inflate_fast (inffast.c)
...
Assembler case:
inflate (inflate.c)
inflate_fast (contrib/masmx64/inffast8664.c)
inffas8664fnc (contrib/masmx64/inffasx64.asm)
...
Problem:
#2.2. is missing ("${ZLIB_SRC_DIR}/CMakeLists.txt" doesn't know anything about inffast8664.c), so the chain is broken, leading to invalid (incomplete) data to build library.
Solution:
Make CMakeLists.txt aware of that file, and that is by adding:
set(ZLIB_SRCS
${ZLIB_SRCS}
contrib/masmx64/inffas8664.c
)
at line ~#158 (enclosed by if(MSVC)
and elseif (AMD64)
conditionals).
Posting full changes as well.
zlib-1.2.11-msvc_x64_asm_speedups.diff:
--- CMakeLists.txt.orig 2017-01-15 08:29:40.000000000 +0200
+++ CMakeLists.txt 2018-09-03 13:41:00.314805100 +0300
@@ -79,10 +79,10 @@
endif()
set(ZLIB_PC ${CMAKE_CURRENT_BINARY_DIR}/zlib.pc)
-configure_file( ${CMAKE_CURRENT_SOURCE_DIR}/zlib.pc.cmakein
- ${ZLIB_PC} @ONLY)
-configure_file( ${CMAKE_CURRENT_SOURCE_DIR}/zconf.h.cmakein
- ${CMAKE_CURRENT_BINARY_DIR}/zconf.h @ONLY)
+configure_file(${CMAKE_CURRENT_SOURCE_DIR}/zlib.pc.cmakein
+ ${ZLIB_PC} @ONLY)
+configure_file(${CMAKE_CURRENT_SOURCE_DIR}/zconf.h.cmakein
+ ${CMAKE_CURRENT_BINARY_DIR}/zconf.h @ONLY)
include_directories(${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_SOURCE_DIR})
@@ -136,30 +136,34 @@
set(ZLIB_ASMS contrib/amd64/amd64-match.S)
endif ()
- if(ZLIB_ASMS)
- add_definitions(-DASMV)
- set_source_files_properties(${ZLIB_ASMS} PROPERTIES LANGUAGE C COMPILE_FLAGS -DNO_UNDERLINE)
- endif()
+ if(ZLIB_ASMS)
+ add_definitions(-DASMV)
+ set_source_files_properties(${ZLIB_ASMS} PROPERTIES LANGUAGE C COMPILE_FLAGS -DNO_UNDERLINE)
+ endif()
endif()
if(MSVC)
if(ASM686)
- ENABLE_LANGUAGE(ASM_MASM)
+ ENABLE_LANGUAGE(ASM_MASM)
set(ZLIB_ASMS
- contrib/masmx86/inffas32.asm
- contrib/masmx86/match686.asm
- )
+ contrib/masmx86/inffas32.asm
+ contrib/masmx86/match686.asm
+ )
elseif (AMD64)
- ENABLE_LANGUAGE(ASM_MASM)
+ ENABLE_LANGUAGE(ASM_MASM)
set(ZLIB_ASMS
- contrib/masmx64/gvmat64.asm
- contrib/masmx64/inffasx64.asm
- )
+ contrib/masmx64/gvmat64.asm
+ contrib/masmx64/inffasx64.asm
+ )
+ set(ZLIB_SRCS
+ ${ZLIB_SRCS}
+ contrib/masmx64/inffas8664.c
+ )
endif()
- if(ZLIB_ASMS)
- add_definitions(-DASMV -DASMINF)
- endif()
+ if(ZLIB_ASMS)
+ add_definitions(-DASMV -DASMINF)
+ endif()
endif()
# parse the full version number from zlib.h and include in ZLIB_FULL_VERSION
The above is a diff. See [SO]: Run / Debug a Django application's UnitTests from the mouse right click context menu in PyCharm Community Edition? (@CristiFati's answer) (Patching UTRunner section) for how to apply patches on Win (basically, every line that starts with one "+" sign goes in, and every line that starts with one "-" sign goes out).
I also submitted this patch: [GitHub]: madler/zlib - Ms VisualStudio - Assembler speedups on x64, but then I closed it, as it's contained in the one at the beginning.
Output:
e:\Work\Dev\StackOverflow\q029505121\build\x64>"c:\Install\Google\Android_SDK\cmake\3.6.4111459\bin\cmake.exe" -G "NMake Makefiles" -DAMD64=ON "e:\Work\Dev\StackOverflow\q029505121\src\zlib-1.2.11"
-- The C compiler identification is MSVC 19.0.24215.1
-- Check for working C compiler: C:/Install/x86/Microsoft/Visual Studio Community/2015/VC/bin/amd64/cl.exe
-- Check for working C compiler: C:/Install/x86/Microsoft/Visual Studio Community/2015/VC/bin/amd64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of off64_t
-- Check size of off64_t - failed
-- Looking for fseeko
-- Looking for fseeko - not found
-- Looking for unistd.h
-- Looking for unistd.h - not found
-- Renaming
-- E:/Work/Dev/StackOverflow/q029505121/src/zlib-1.2.11/zconf.h
-- to 'zconf.h.included' because this file is included with zlib
-- but CMake generates it automatically in the build directory.
-- The ASM_MASM compiler identification is MSVC
-- Found assembler: C:/Install/x86/Microsoft/Visual Studio Community/2015/VC/bin/amd64/ml64.exe
-- Configuring done
-- Generating done
-- Build files have been written to: E:/Work/Dev/StackOverflow/q029505121/build/x64
e:\Work\Dev\StackOverflow\q029505121\build\x64>"c:\Install\Google\Android_SDK\cmake\3.6.4111459\bin\cmake.exe" --build . --target zlibstatic
Scanning dependencies of target zlibstatic
[ 5%] Building C object CMakeFiles/zlibstatic.dir/adler32.obj
adler32.c
[ 10%] Building C object CMakeFiles/zlibstatic.dir/compress.obj
compress.c
[ 15%] Building C object CMakeFiles/zlibstatic.dir/crc32.obj
crc32.c
[ 21%] Building C object CMakeFiles/zlibstatic.dir/deflate.obj
deflate.c
Assembler code may have bugs -- use at your own risk
[ 26%] Building C object CMakeFiles/zlibstatic.dir/gzclose.obj
gzclose.c
[ 31%] Building C object CMakeFiles/zlibstatic.dir/gzlib.obj
gzlib.c
[ 36%] Building C object CMakeFiles/zlibstatic.dir/gzread.obj
gzread.c
[ 42%] Building C object CMakeFiles/zlibstatic.dir/gzwrite.obj
gzwrite.c
[ 47%] Building C object CMakeFiles/zlibstatic.dir/inflate.obj
inflate.c
[ 52%] Building C object CMakeFiles/zlibstatic.dir/infback.obj
infback.c
[ 57%] Building C object CMakeFiles/zlibstatic.dir/inftrees.obj
inftrees.c
[ 63%] Building C object CMakeFiles/zlibstatic.dir/inffast.obj
inffast.c
Assembler code may have bugs -- use at your own risk
[ 68%] Building C object CMakeFiles/zlibstatic.dir/trees.obj
trees.c
[ 73%] Building C object CMakeFiles/zlibstatic.dir/uncompr.obj
uncompr.c
[ 78%] Building C object CMakeFiles/zlibstatic.dir/zutil.obj
zutil.c
[ 84%] Building C object CMakeFiles/zlibstatic.dir/contrib/masmx64/inffas8664.obj
inffas8664.c
[ 89%] Building ASM_MASM object CMakeFiles/zlibstatic.dir/contrib/masmx64/gvmat64.obj
Microsoft (R) Macro Assembler (x64) Version 14.00.24210.0
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: E:\Work\Dev\StackOverflow\q029505121\src\zlib-1.2.11\contrib\masmx64\gvmat64.asm
[ 94%] Building ASM_MASM object CMakeFiles/zlibstatic.dir/contrib/masmx64/inffasx64.obj
Microsoft (R) Macro Assembler (x64) Version 14.00.24210.0
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: E:\Work\Dev\StackOverflow\q029505121\src\zlib-1.2.11\contrib\masmx64\inffasx64.asm
[100%] Linking C static library zlibstatic.lib
[100%] Built target zlibstatic
Notes:
I am using VStudio 2015
Regarding the above output:
To keep the output as small as possible, I am only building the static version
- For the same reason (and also to keep it as just text), I'm building for "NMake Makefiles" (CmdLine build)
inffas8664.c is being built (somewhere near the end)
You could also disable assembler speedups (by unchecking AMD64 in CMake-GUI), but that would be just a workaround
I did some rough tests (by far, I'm not claiming these results to be general), and the performance improvement of the assembler implementation over the standard one (Debug versions) was (a percent below is the ratio between times took to perform the same operation (with / without) speedups):
Compress: ~86%
Decompress: ~62%
Update #0
[GitHub]: madler/zlib - ASM zlib build on Windows gives erroneous results (@madler's comment) states (emphasis is mine):
What assembly code is being used? There are a few in zlib's contrib directory. By the way, the stuff in the contrib directory is not part of zlib. It is just there as a convenience and is supported (or not) by those third-party contributors. What I will do is simply remove the offending code from the next distribution.
So does the compile warning (that everyone must have seen (and most likely ignored)):
Assembler code may have bugs -- use at your own risk
Apparently, assembler speedups and VStudio don't (didn't) get along very well. More, on x86 (pc032), there are some issues:
After fixing them, everything works fine, and the performance improvements are similar to pc064.
In case anyone needs them, I've built and placed binaries at [GitHub]: CristiFati/Prebuilt-Binaries - (master) Prebuilt-Binaries/ZLib (with / without speedups).