Python imports and file embebed

Question

Im working on a project that imports several packages and when the script runs, I load a neural net model.

I want to know if the following is achievable:

If i run the script in another python environment, i need to install all the packages im importing. Is it possible to avoid this? This will remove the need to install all the packages the first time.
Is it possible to embed the neuralnet .pb into the code? Keep in mind that it weighs 80mb, so an hex dump doesnt work (text file with the dump weighs 700 mb)

The idea is to have 1 .py with everything necessary within. Is it possible?

Thank you!

What have you tried? I believe there are a bunch of tools to do this sort of thing. — AMC, Dec 05 '19 at 03:12

score 1 · Accepted Answer · answered Dec 05 '19 at 01:11

If i run the script in another python environment, i need to install all the packages im importing. Is it possible to avoid this?

Well, not really but kinda (TL;DR no, but depends on exactly what you mean). It really just boils down to being a limitation of the environment. Somewhere, someplace, you need the packages where you can grab them from disk -- it's as simple as that. They have to be available and locatable.

By available, I mean accessible by means of the filesystem. By locatable, I mean there has to somewhere you are looking. A system install would place it somewhere that would be accessible, and could be reliably used as a place to install, and look for, packages. This is part of the responsibility of your virtual environment. The only difference is, your virtual environment is there to separate you from your system Python's packages.

The advantage of this is straight forward: I can create a virtual environment that uses the package slamjam==1.2.3, where the 1.2.3 is a specific version of the package slamjam, and also run a program that uses slamjam==1.7.9 without causing a conflict in my global environment.

So here's why I give the "kinda" vibe: if your user already has a package on your system, then your user needs to install nothing. They don't need a virtual environment for that package if it's already globally installed on their system. Likewise, they don't need a new one if it's in another virtual environment, although it is a great idea to separate your projects dependencies with one.

Is it possible to embed the neuralnet .pb into the code? Keep in mind that it weighs 80mb, so an hex dump doesnt work (text file with the dump weighs 700 mb)

So, yeah, actually it's extremely doable. The thing is, it depends on how you mean.

Like you are aware, a hex dump of the file takes a lot of space. That's very true. But it seems that you are talking about raw hex, which for every byte takes 2 bytes at minimum. Then, you might be dumping out extra information with that if you used a tool like hexdump, yada, yada yada.

Moral of the story, you're going to waste a lot of space doing that. So I'll give you a couple options, of which you can choose one, or more.

Compress your data, even more, if it is possible.

I haven't worked with TensorFlow data, but after a quick read, it appears it uses compression with ProtoBufs, and it's probably pretty compressed already. Well, whatever, go ahead and see if you can squeeze any more juice out of the fruit.

Take binary data, and dump it into a different encoding (hint, hint: base64!)

Watch what happens when we convert something to hex...

>>> binary_data=b'this is a readable string, but really it just boils down to binary information. i can be expressed in a more efficient way than a binary string or hex, however'
>>> hex_data = binary_data.hex()
>>> print(hex_data)
746869732069732061207265616461626c6520737472696e672c20627574207265616c6c79206974206a75737420626f696c7320646f776e20746f2062696e61727920696e666f726d6174696f6e2e20692063616e2062652065787072657373656420696e2061206d6f726520656666696369656e7420776179207468616e20612062696e61727920737472696e67206f72206865782c20686f7765766572
>>> print(len(hex_data))
318

318 characters? We can do better.

>>> import base64
>>> hex_data = binary_data.hex()
>>> import base64
>>> b64_data = base64.b64encode(binary_data)
>>> print(b64_data)
b'dGhpcyBpcyBhIHJlYWRhYmxlIHN0cmluZywgYnV0IHJlYWxseSBpdCBqdXN0IGJvaWxzIGRvd24gdG8gYmluYXJ5IGluZm9ybWF0aW9uLiBpIGNhbiBiZSBleHByZXNzZWQgaW4gYSBtb3JlIGVmZmljaWVudCB3YXkgdGhhbiBhIGJpbmFyeSBzdHJpbmcgb3IgaGV4LCBob3dldmVy'
>>> print(len(b64_data))
212

You've now made your data smaller, by 33%!

Package a non-Python file with your .whl distribution. Yeah, totally doable. Have I done it before? Nope, never needed to yet. Will I ever? Yep. Do I have great advice on how to do it? No. But I have a link for you, it's totally doable.
You can download the file from within the application and only provide the URL. Something quick and easy, like

import wget

file_contents_in_memory = wget.download('some.site.com/a_file`)

Yeah, sure there are other libraries like requests which do similar things, but for the example, I chose wget because it's got a simple interface too, and is always an option.

The idea is to have 1 .py with everything necessary within. Is it possible?

Well, file, yeah. For what you're asking -- a .py file with nothing else that will install your packages? If you really want to copy and paste library after library and all the data into one massive file nobody will download, I'm sure there's a way.

Let's look at a more supported approach for what you're asking: a whl file is one file, and it can have an internal list of packages you need to install the .whl, which will handle doing everything for you (installing, unpacking, etc). I'd look in that direction.

Anyway, a lot of information I know, but there's some logic as to why you can or can't do something. Hoped it helped, and best of luck to you.

Thanks! Today i'll try some of the options and come back – user9964838 Dec 05 '19 at 12:14 — user9964838, Dec 05 '19 at 12:14

Python imports and file embebed

1 Answers1