11

I've recently downloaded all packages from PyPI. One interesting observation was that of the Top-15 of the biggest packages, all execept one are deep learning packages:

I looked at mxnet-cu90. It has exactly one huge file: libmxnet.so (936.7MB). What does this file contain? Is there any way to make it smaller?

I'm especially astonished that those libraries are so huge considering that one usually uses them on top of CUDA + cuDNN, which I thought would do the heavy lifting.

As a comparison, I looked at related libraries with which you can also build deep learning libraries:

  • numpy: 6MB
  • sympy: 6MB
  • pycuda: 3.6MB
  • tensorflow-cpu: 116MB (so the GPU version needs 241 MB more or around 3x the size!)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 3
    Not sure I follow the question. Why do you think that they should be smaller than they are? What do you mean by "huge"? – EJoshuaS - Stand with Ukraine Jan 13 '20 at 17:15
  • 2
    They are by far the biggest ones on PyPI. mxnet-cu90 is 600 MB. Except for a pure-data package, the next biggest package is less than 350 MB (I need to check how much less). Also, numpy is ~6MB. So 1% of mxnet. sympy is 6MB as well. pycuda is 3.6MB. – Martin Thoma Jan 13 '20 at 17:27
  • 1
    @EJoshuaS-ReinstateMonica I've added a couple of numbers for comparison. – Martin Thoma Jan 13 '20 at 18:03
  • 1
    Yeah, those numbers do seem weirdly high now that you include those details. I'll vote to reopen. – EJoshuaS - Stand with Ukraine Jan 13 '20 at 18:12
  • Can you clarify those sizes? For ``mxnet-cu90`` you list both 940 MB and 600 MB, while [PyPI lists 490 MB](https://pypi.org/project/mxnet-cu90/#files). – MisterMiyagi Jan 14 '20 at 08:52
  • 1
    I looked [at this version of mxnet-cu90](https://pypi.org/project/mxnet-cu90/1.6.0b20200113/#files). 600 MB is the compressed size, 940MB is the uncompressed. All other sizes refer to the compressed size – Martin Thoma Jan 14 '20 at 09:24
  • 2
    it is large because they packaged cudnn inside – snowflake Jan 16 '20 at 11:59
  • 1
    @snowflake do you want to post this as an answer? Maybe give some more details about why this is so much and why they package cudnn inside in the first place? – Martin Thoma Jan 18 '20 at 13:27

1 Answers1

6

Deep learning frameworks are large because they package CuDNN from NVIDIA into their wheels. This is done for the convenience of downstream users.

CuDNN are the primitives that the frameworks call to execute highly optimised neural network ops (e.g. LSTM)

The unzipped version of CuDNN for windows 10 is 435MB.

snowflake
  • 902
  • 1
  • 6
  • 18