2

I have huge pickle file around 6 GB generated for RainForestClassifer training samples using joblib.dump(). Each and every execution has to load the pickle objects using joblib.load() for processing the input data. The loading time is very high and subsequently hitting the performance of script execution.

Is there a way that object once loaded can be persisted in memory and make it available for subsequent python executions with out calling joblib.load().

Does using DB like sqlite will help loading the data faster?

chandu
  • 403
  • 1
  • 6
  • 15
  • Does loading 3GB content from sqlitedict will be faster or unpickling 3 GB pickle file using joblib.load() ? – chandu Feb 17 '16 at 07:24
  • you can possibly reduce the pickle object load time greatly by wrapping joblib calls as mentioned [here](http://stackoverflow.com/a/36699998/2385420) (in your case, replace `pickle` by `joblib`). Let me know if that works for you. – Tejas Shah Jan 19 '17 at 04:57
  • also, while doing a joblib.dump(), specify the [HIGHEST_PROTOCOL](https://docs.python.org/2/library/pickle.html#pickle.HIGHEST_PROTOCOL) for better performance. pickle and joblib use the [same protocol](https://pythonhosted.org/joblib/generated/joblib.dump.html) – Tejas Shah Jan 19 '17 at 05:01
  • I have tried them but our data set really huge, so we have started a http server in the back ground and load all the pickle files in memory. In this way only the first request is going to take much time. The later executions will be pretty faster. – chandu Jan 19 '17 at 13:34

0 Answers0