0

Two libraries are found on Pypi, 'python-magic' and 'filemagic', which can be imported with same import statement:

https://pypi.org/project/python-magic/

https://pypi.org/project/filemagic/

import magic

I want to import both of these libraries in my django project?

I want to consume magic.from_buffer() method from python-magic lib & magic.id_buffer() method from libmagic.

I have pip installed both python-magic, libmagic, filemagic into my project. Also since I need to upload this project in custom server setup, making changes in to these packages in my venv won't help.

I tried importing magic using AS in import statements

from filemagic import magic as mm (code does not identify filemagic in this statement) from python-magic import magic as pm (code does not identify python-magic in this statement)

EDIT - For a clearer objective, I want to identify file type on content basis. I have two different types of file inputs, 1. user uploaded file, 2. user uploaded file in base64 encoded format.

I have already been using magic.Magic().id_buffer to read files of type 1.

I now want to add code to handle base64 encoded files as solved in this post How to know MIME-type of a file from base64 encoded data in python?

when run on server(other than my local dev env), I get 'magic' module has no attribute 'from_buffer' error.

Frid-j
  • 25
  • 5
  • It looks like these two libraries are intended to **do the same thing in the same way** (interface to `libmagic` in order to identify file types), so what benefit do you hope to gain by using both? – Karl Knechtel Apr 16 '23 at 07:38
  • I know the other question didn't get very good answers, but this is as clear of a duplicate as it gets. The other question even asks about the same exact two packages! – Karl Knechtel Apr 16 '23 at 08:12
  • @KarlKnechtel added my requirement in question(EDIT). – Frid-j Apr 16 '23 at 09:54
  • Either of the libraries you are talking about will be able to do that. No matter which one you choose, you will have to base-64-decode the file (or at least a little bit from the start) first. – Karl Knechtel Apr 16 '23 at 09:58
  • @KarlKnechtel from security perspectives, if I decode a faulty file before I verify if the files' content are as expected, it would violate the whole objective of content validation of file. – Frid-j Apr 16 '23 at 10:20
  • You cannot cause any kind of security risk by simply reading the content of a file and decoding it from base64. If it is valid base64, then you just get another sequence of bytes. Otherwise you get an exception. Security problems occur when you try to interpret a file as *meaningful*, i.e., as something that contains *directions* to another process (e.g. a malicious image tries to create a buffer overflow by specifying a bad width or height; there is only a security problem if you actually try to allocate the corresponding memory or write into it). – Karl Knechtel Apr 16 '23 at 10:24
  • Aside from that, the underlying `libmagic` - no matter what Python bindings you use - will **need to examine file contents anyway** in order to decide the file type, because it is the first few bytes of the file **that declare** the type (in many standard cases, which it will try to detect). Please keep in mind that files are **just** a sequence of bytes and "file type" is **not intrinsic to** a file; it is something that has to be inferred from the data itself (by convention, some part of it is metadata), the filename (such as its extension), or other metadata. – Karl Knechtel Apr 16 '23 at 10:26
  • @KarlKnechtel thank you so much for the info. I was under the impression that decoding a file might expose malicious file contents to my system. Solved my issue(by reading decoded file) using a single library i.e. filemagic. However, let's keep the forum open for solutions to import the libs in case some other usecase. – Frid-j Apr 16 '23 at 11:08

1 Answers1

0

This is possible only with some big tradeoffs. Python imports of a package roughly looks like this:

  1. Python takes the value of sys.path
  2. It looks through all of these directories and tries to find a matching subdirectory in it (In your case that would be magic)

In the process of installing a package, all files of the package that gets installed get stored in the virtualenv in a subdirectory named after how it should be imported (In your case magic).

This creates a big problem when having two packages with the same name under which they store their files. The two packages will both write their files into the same directory.

In your case it's pretty lucky, only the __init__.py file gets overwritten by the package that gets installed second. This means that you can still use any contents provided by both packages.

But now instead of import magic or from magic import ... you need to use import magic.<modulename> or from magic.<modulename> import ....

Clasherkasten
  • 488
  • 3
  • 9
  • you caught correct, my __init__.py got overwritten, but the new Magic class does not have id_buffer method. Also, can not import using your methods import magic.. Since I need to run this code on a server, where venv code does not count & only requirement.txt file works, any changes as per venv would not help. – Frid-j Apr 16 '23 at 10:05