-1

Reading further from this post how can we get source code from the pickle file

I tried using getsource ( after reading this post)but that only works when class is defined in same session, below is the code that I tried

class Foo(object):
def bar(self, x):
 return self.y + x
def __init__(self, y):
 self.y = y
import dill
     
f = Foo(5)
             
with open('foo.pkl', 'wb') as pkl:
  dill.dump(f, pkl)

with open('foo.pkl', 'rb') as pkl:
 b = dill.load(pkl)

print(b)

#sFoo = dill.source.getsource('foo.pkl') ## error
#sFoo = dill.source.getsource(b) ## error
#sFoo = dill.source.getsource(b.bar) ## error

Error details When Using sFoo = dill.source.getsource(b) error is OSError: could not extract source code

When Using sFoo = dill.source.getsource(b.bar) error is OSError: could not extract source code

sakeesh
  • 919
  • 1
  • 10
  • 24
  • It would help to see what errors you are getting. It would seem the the first and third of your attempts are due to using the function in an unintended fashion. `getsource` extracts code from objects, so calling it on `foo.pkl` doesn't make sense. Similarly, if you are dumping `f.bar`, then after load, `b` should be a duplicate of `f.bar`.. so `b.bar` doesn't make sense, and should be an attribute error or similar. – Mike McKerns Sep 10 '21 at 13:20
  • Hello Mike, Thank you for comment. I have updated error that I got. I agree that first is not correct use of ```getsource```. Now instead of dumping ```f.bar``` I have changed it to ```f``` Is there any other way by which source code can be retrieved if ```getsource``` is not the legitimate way? – sakeesh Sep 10 '21 at 15:30
  • `getsource` is totally a legitimate way to get the code. – Mike McKerns Sep 11 '21 at 17:35

1 Answers1

0

Reading further from here I ran the code from standard python console rather than using jupyter, still didn't get correct result but better than earlier no error this time

dill.source.getsource(b)

Output I get is as below

'import dill\ndill.loads(b\'\\x80\\x03cdill._dill\\n_create_type\\nq\\x00(cdill._dill\\n_load_type\\nq\\x01X\\x04\\x00\\x00\\x00typeq\\x02\\x85q\\x03Rq\\x04X\\x03\\x00\\x00\\x00Fooq\\x05h\\x01X\\x06\\x00\\x00\\x00objectq\\x06\\x85q\\x07Rq\\x08\\x85q\\t}q\\n(X\\n\\x00\\x00\\x00__module__q\\x0bX\\x08\\x00\\x00\\x00__main__q\\x0cX\\x03\\x00\\x00\\x00barq\\rcdill._dill\\n_create_function\\nq\\x0e(h\\x01X\\x08\\x00\\x00\\x00CodeTypeq\\x0f\\x85q\\x10Rq\\x11(K\\x02K\\x00K\\x02K\\x02KCC\\n|\\x01|\\x00j\\x00\\x17\\x00S\\x00q\\x12N\\x85q\\x13X\\x01\\x00\\x00\\x00yq\\x14\\x85q\\x15X\\x04\\x00\\x00\\x00selfq\\x16X\\x01\\x00\\x00\\x00xq\\x17\\x86q\\x18X\\x07\\x00\\x00\\x00<stdin>q\\x19h\\rK\\x02C\\x02\\x00\\x01q\\x1a))tq\\x1bRq\\x1cc__builtin__\\n__main__\\nh\\rNN}q\\x1dtq\\x1eRq\\x1fh\\x14K\\x01X\\x07\\x00\\x00\\x00__doc__q NX\\r\\x00\\x00\\x00__slotnames__q!]q"utq#Rq$)\\x81q%.\')\n'

Further in the note suggestion was to use dill.source.getsource(dill.detect.code(b)) but this gives error as TypeError: None is not a module, class, method, function, traceback, frame, or code object

Then when I tried dill.source.getsource(b.bar) I get perfect result

>>> dill.source.getsource(b.bar)
'  def bar(self, x):\n    return x+self.y       \n'

Now something I got

What we can do is first trace the pickle file using below code, this will give a list of all functions in the class then we can individually detect their source code using getsource.

dill.detect.trace(True)
dill.pickles(b)
sakeesh
  • 919
  • 1
  • 10
  • 24
  • 1
    You get what you get for `b`, because it's an instance of a class --- hence, there's source code plus state. If you instead get the spruce for `b.__class__` (or `Foo`), you should just get the code for the class. Also, if you use `enclosing=True` on `b.bar` you should at least get some (if not all) of `Foo`. – Mike McKerns Sep 11 '21 at 17:39
  • 1
    Also, jupyter messes with the namespace, so each of the cells are in their own namespace, and not the global namespace. So, if you have all of your code in a single cell, it should work as expected. – Mike McKerns Sep 11 '21 at 18:52
  • Thanks Mike I did get relevant output when I used ```b.__class__``` . ```>>> dill.source.getsource(b.__class__) 'class Foo(object):\n def bar(self, x):\n return self.y + x\n def __init__(self, y):\n self.y = y\n'``` – sakeesh Sep 12 '21 at 06:56
  • Apart from this I have also noticed that under certain scenario getsource do not give correct output, especially when we have dict's. Anyway the workaround that was done was to use pickletools and extract the code to a text file then using regular expression to extract the required dicts. The command used was ```python -m pickletools /usr/data/dat.pkl > /usr/data/dat.txt```, the circumstances are not clear to me now ( as I have no idea how dat.pkl was created I just know that it was created using dill) will need to check with creator of dat.pkl file to understand this. – sakeesh Sep 12 '21 at 07:04
  • @MikeMcKerns Extending further on the knowledge that you had shared to use ```b.__class__``` when we use ```b.__dict__``` I do get valid dicts in the pickle file. I think my goal is achieved, so all the crazy workarounds are not needed. thank you for help. – sakeesh Sep 12 '21 at 07:16