13

I have the following situation: I am working on several projects which make use of library modules that I have written. The library modules contain several classes and functions. In each project, some subset of the code of the libraries is used.

However, when I publish a project for other users, I only want to give away the code that is used by that project rather than the whole modules. This means I would like, for a given project, to remove unused library functions from the library code (i.e. create a new reduced library). Is there any tool that can do this automatically?

EDIT

Some clarifications/replies:

  1. Regarding the "you should not do this in general" replies: The bottom line is that in practice, before I publish a project, I manually go through the library modules and remove unused code. As we are all programmers, we know that there is no reason to do something manually when you could easily explain to a computer how to do it. So practically, writing such a program is possible and should even not be too difficult (yes, it may not be super general). My question was if someone know whether such a tool exists, before I start thinking about implementing it by myself. Also, any thoughts about implementing this are welcome.
  2. I do not want to simply hide all my code. If I would have wanted to do that I would have probably not used Python. In fact, I want to publish the source code, but only the code which is relevant to the project in question.
  3. Regarding the "you are legally protected" comments: In my specific case, the legal/license protection does not help me. Also, the problem here is more general than some stealing the code. For example, it could be for the sake of clarity: if someone needs to use/develop the code, you don't want dozens of irrelevant functions to be included.
Bitwise
  • 7,577
  • 6
  • 33
  • 50
  • 1
    I wonder (sincerely) why this is not common practice with other python (or non-python) libraries. Perhaps it brings more problems than it solves (or is not THAT necessary). – heltonbiker Jun 13 '13 at 13:06
  • 16
    Note that this is virtually impossible to do correctly in general: There's no way you can predict which functions may be accessed through dynamic features like `getattr`. –  Jun 13 '13 at 13:08
  • 5
    Ultimately, a module/library should be a collection of functions/classes which work together to accomplish a single goal. If some of the functions aren't ever used, then it makes you wonder if they should really be a part of the same library/module. – mgilson Jun 13 '13 at 13:08
  • @delnan -- Yep. you could do `getattr(library,'noitcnuf'[::-1])`. Good luck finding a tool which can figure out that `library.function` is now needed ... – mgilson Jun 13 '13 at 13:09
  • @delnan I realize that in general this is difficult, but in practice I am not accessing dynamic features, so it should be possible. – Bitwise Jun 13 '13 at 13:12
  • 2
    And on a side question, why all this fear of "giving away" some code? Python itself - and basically all the tools that put the internet - mobile clients included - today evolved from deliberately _sharing_ code - not hiding it. If that is not your company practice, your I.P. rights should be enforced by contract, as any technical means would fall short if there was sufficient interest in cracking them. – jsbueno Jun 13 '13 at 13:13
  • 1
    @jsbueno in my specific case I cannot protect myself from people using my code without giving me the appropriate credit. Sharing is important and good, but not always possible. – Bitwise Jun 13 '13 at 13:16
  • @mgilson I see your point about design, but it is quite common to be in a situation where you write several different utility functions, and you probably don't want each of them in a separate module. – Bitwise Jun 13 '13 at 13:19
  • 1
    i'm surprised/disappointed this has been closed. just because the answer is "normally you don't do this" doesn't mean there is no valid answer. also, some obfuscators might attempt this (but given the dynamic nature of python it's undecidable and so unsafe in general) see http://stackoverflow.com/questions/576963/python-code-obfuscation for more on obfuscation. – andrew cooke Jun 13 '13 at 13:40
  • See: http://stackoverflow.com/questions/693070/how-can-you-find-unused-functions-in-python-code – cschooley Jun 13 '13 at 14:15
  • @cschooley I'm not sure that this gives what he wants. Afaict, for coverage or figleaf, he'd need to make a full test suite (though this is a good thing for a project) or make it run the program through all its possibilites, to discover all the possible usage patterns made by his exposed library. For vulture or pyflakes, it will only look at tokens and guess what imports are not used in current module, not lookup what is not being used, like if it's an import inside a function that has been imported in current code... – zmo Jun 13 '13 at 14:34
  • OP: The reason you should not remove unused functions from your library is that if someone else wants to use your library (and they are allowed to), they will be missing the functions you have removed! View a library not as something tailored for a project, but as a general code base that can be used with many projects to achieve something. A good library is a library that supports all the methods it should. Don't make your library worse by removing support for some of the good methods you have created! – kqr Jun 13 '13 at 15:03
  • To add to that, a well-documented library is *not* confusing, even if it has a lot of methods. The user will look up the methods they need to know about in the documentation, and will not care for the others. – kqr Jun 13 '13 at 15:04
  • @Bitwise "I cannot protect myself from people using my code without giving me the appropriate credit." And as hard as Microsoft tries, it won't be able to stop the truly dedicated from turning off the Kinects on their Xbox Ones. If you're so worried about people using your code without attribution that inclusion of an open-source license requiring attribution isn't enough for you, why not write your libraries as Python extensions using the C API and then supply documentation for your libraries but no source code? Python does not require modifications or extensions to it to be open-source. – JAB Jun 13 '13 at 20:40

3 Answers3

3

My first advice to you would be to design your code with a stronger modularity, so that you can have all the functions you want to keep in as many python modules/eggs as you have to make it flexible to have just what you need for each of your projects. And I think that would be the only way to keep your code easily manageable and readable.

That said, if you really want to go the way you describe in your question, to my knowledge there's no tool that does exactly what you say, because it's an uncommon usage pattern.

But I don't think it would be hard to code a tool that does it using rope. It does static and dynamic code analysis (so you can find what imported objects are being used thus guess what is not used from your imported modules), and also gives many refactoring tools to move or remove code.

Anyway, I think to be able to really make a tool that find accurately all code that is being used in your current code, you need to make a full unit test coverage of your code or you shall be really methodical in how you import your module's code (using only from foo import bar and avoiding chaining imports between the modules).

zmo
  • 24,463
  • 4
  • 54
  • 90
1

I agree with @zmo - one way to avoid future problems like this is to plan ahead and make your code as modular as possible. I would have suggested putting the classes and functions in much smaller files. This would mean that for every project you make, you would have to hand-select which of these smaller files to include. I'm not sure if that's feasible with the size of your projects right now. But for future projects it's a practice you may consider.

mad-hay
  • 81
  • 6
-1

If your purpose is to not give away code, then just distribute the python-compiled library, rather than the source code for it. No need to manually weed code calls, just distribute the pyc versions of your files. If you're afraid that people will take your code and not give you credit, don't give them code if there is an alternative.

That said, we have licenses for a reason. You put the minimal header and your attribution at the top of every file, and you distribute a LICENSE file with your software that clearly indicates what people are, and are not, allowed to do with your source code. If they violate that, and you catch them, you now have legal recourse. IF you don't trust people to uphold that license: that's the whole reason it's there. If your code is so unique that it needs to be licensed for fear of others passing it off as their own, it will be easy to find infractions. If, however, you treat all your code like this, a small reality check: you are not that good. Almost nothing you write will be original enough that others haven't already also written it, trying to cling to it is not going to benefit you, or anyone else.

Best code protection? Stick it online for everyone to see, so that you can point everyone else to it and go "see? that's my code. And this jerk is using it in their own product without giving me credit". Worse code protection, but still protection: don't distribute code, distribute the compiled libraries. (Worst code protect: distributing gimped code because you're afraid of the world for the wrong reasons)

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • 2
    Be aware that `.pyc` files can, by default, be trivially decompiled – sapi Jun 13 '13 at 14:18
  • Of course, note that this is true for virtually all compiled code. But someone needs to be pretty hell-bent on stealing your code to decompile it, rather than just "screw it, I'll just use this other, properly open source library that does the same thing" =) – Mike 'Pomax' Kamermans Jun 13 '13 at 14:22
  • 1
    The situation is actually more complicated. I *do* want to give out the code of the project I publish. I do not want to give out code which is irrelevant to the project. – Bitwise Jun 13 '13 at 14:39
  • 1
    then don't. If you have libraries, release those libraries, and make use of them yourself as imports only. Unless your projects are for tiny memory devices, pruning for each project doesn't save you anything (the code you leave can still be "stolen") and doesn't help others who might want to make use of your library as-is. If you're writing libraries, rather than per-project classes, package it up, release it as a module, and move on. – Mike 'Pomax' Kamermans Jun 13 '13 at 14:45