0

I have a program which parses a 100MB file, then I apply some functions on the data. I didn't implement the functions to check the bottleneck...

So I just put my implementation in comment and just put pass

WHy is python using so much memory ?

It takes 15 minutes to parse the file and I can see python is using 3GB of memory, CPU is on 15% usage and Memory is on 70% usage.

Does it apply the program is io bound ?

How can I fasten the parsing ? Or isn't there anything to do against the slow parsing ?

File sample: Age and Salary

50 1000
40 123
1233 123213

CODE:

def parse(pathToFile):
    myList = []
    with open(pathToFile) as f:
        for line in f:
            s = line.split()
            age, salary = [int(v) for v in s]
            Jemand = Mensch(age, salary)
            myList.append(Jemand)
    return myList
  • Depending on what you are trying to do, you might be better off using something like `numpy`'s `np.loadtxt` to read the file quickly. – VBB Feb 05 '17 at 09:07

2 Answers2

2

your code could be improved for speed a great deal:

with open(pathToFile) as f:
    for line in f:
        s = line.split()
        age, salary = [int(v) for v in s]
        Jemand = Mensch(age, salary)
        myList.append(Jemand)

is slow because of

  • the loop
  • the append
  • the useless list comp to convert to integer, assigned to fixed number of values

it could become a quasi one-liner:

with open(pathToFile) as f:
    myList = [Mensch(*(int(x) for x in line.split())) for line in f]

(using list chained list comprehension & generator comprehension, as far as passing the parameters to the class with * unpacking)

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Thank you, but now I need to make a new object in list. `newObj = (Jemand, 2);` How can I add this to your one commnad ? – Hertha BSC fan Feb 05 '17 at 09:54
  • I edit my question if you could help I would appreciate. – Hertha BSC fan Feb 05 '17 at 09:56
  • first you changed your question, then you create a `Rate` variable that you don't store... and last you ask another question. – Jean-François Fabre Feb 05 '17 at 11:29
  • Fabre, sorry I just tried to understand list comprehension on my own but I'm new to python. I have 3 variables which I need to store this way: `L = [obj1T(objT2(var1, var2), var3), obj1T(objT2(var1, var2), var3))]` – Hertha BSC fan Feb 05 '17 at 11:35
  • @HerthaBSCfan edit question functionality is present in SO to provide OPs the ability to improve the question. You should not be changing the context of the question (it is fine to do that if there are no answers posted to question). Because when you edit the context of question, the current answers mentioned to the questions becomes invalid, and the people who provided answers will be getting down-votes for helping you (because their answer no longer solve the issue mentioned in question) – Moinuddin Quadri Feb 05 '17 at 11:40
  • @MoinuddinQuadri, I'm sorry, I picked his answer as `best` to mark it was great. But after posting this I needed a slight change and I can't make it work with list comprehension. I am trying again now. – Hertha BSC fan Feb 05 '17 at 11:44
  • SO is wiki for the users all around the globe. Just marking the answer as accepted is not a good idea when you changed the question. It is better to ask the answer for the further issues you faced (but do not bug them), or create a new question with the issue you are facing and the link to previous question you asked – Moinuddin Quadri Feb 05 '17 at 11:47
  • @MoinuddinQuadri, roger that. I won't... should I reedit the question to original ? – Hertha BSC fan Feb 05 '17 at 11:49
  • @HerthaBSCfan I will suggest to roll it back to the version for which this answer was posted (so that Jean could remove *"the question has been edited since, which renders my answer invalid."* from his answer. It doesn't look good as a first line of any answer. Isn't it? :) ) – Moinuddin Quadri Feb 05 '17 at 11:52
  • @HerthaBSCfan I'm still willing to help. First, does that runs fast enough? if so, it's because of listcomp so you cannot "break the chain" by creating 2 objects in such a twisted way. Why not creating 1 objects with the 3 parameters, and that object (`Worker`) would create the underlying `Mensch` in its constructor. That would work without changing my answer, but changing the parameters of your `Worker` object constructor. – Jean-François Fabre Feb 05 '17 at 14:16
0

Poor performance you observe may be caused by a bug in the Python garbage collector. To resolve this issue, disable garbage collection as you build the list and turn it on after you finish. For more details see this SO article

Community
  • 1
  • 1
ivan_onys
  • 2,282
  • 17
  • 21