0

I've a large text file (3.7GB) contains 300,000,000 lines. I should read every line, process it, append it to another text file, so Which is better multiprocessing or multithreading and why? I need to complete process it as much faster as can.

EDITED: I use python3

  • What does "process it" actually mean here? – roganjosh Dec 26 '19 at 12:49
  • I am assuming that you mean distributed processing for multiprocessing, is that correct? – Remis Haroon - رامز Dec 26 '19 at 12:50
  • Is this a one time activity? or its going to be repeated and re-used? Can we expect more and bigger files in future? – Remis Haroon - رامز Dec 26 '19 at 12:51
  • 1
    `multiprocessing` doesn't need to be distributed processing, it's a [module](https://docs.python.org/3.4/library/multiprocessing.html?highlight=process) – roganjosh Dec 26 '19 at 12:51
  • I mean take that line & add some text to it, or do somthing else, then append it to another file text., and repeat it – John Garney Dec 26 '19 at 12:52
  • So, is there any way to do it better and shorter time? – John Garney Dec 26 '19 at 12:53
  • "Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference." ref: https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python – Remis Haroon - رامز Dec 26 '19 at 12:58
  • Does this answer your question? [Multiprocessing vs Threading Python](https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python) – stovfl Dec 26 '19 at 13:13
  • What specific part are you trying to achieve using another process/thread ? e.g. Appending it to other file.. is it just to make sure you processed that line.. than you can go with file offset. If you create multiprocess or multithread you also have to deal with how to pass data between them. Threads are lighter to create less overhead on system. But i guess you need to edit your post to provide more details. – PraveenB Dec 26 '19 at 21:29

0 Answers0