2

I am new to python and I have a question. In this script I rename the elements of a string. For a small input z < 1mb text file time is small. If I try input more than 2Mb it takes over 1 hour.

Is the time problem caused by the dictionary? Should I try to approach it via list or set? I have seen this article Python: List vs Dict for look up table about dicts being better than lists. I am a bit confused.

Community
  • 1
  • 1
bill
  • 728
  • 3
  • 6
  • 15
  • 1
    What the heck are you trying to do? I understand what the code does, but I'm curious as to its application. Why are you doing this? – inspectorG4dget Mar 01 '14 at 21:26
  • 2
    Be more explicit, particularly what are `z` and `d`? Also there's a useful module called `timeit` that does the thing you're doing here manually (timing code execution). – Aleksander Lidtke Mar 01 '14 at 21:29
  • Input is an Inverted File, so I try to rename all the records to decrease the DGaps – bill Mar 01 '14 at 21:29

2 Answers2

6

First of all if word in d.keys() is very slow as it builds the list consisting of all d keys each time. you should use if word in d instead (it is much faster as it does not create new obkects)

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • 2
    Also, `if word in d.keys()` does a *list* search (linear time); `if word in d` does a *dictionary* search (constant time). – nneonneo Mar 01 '14 at 21:40
  • 1
    @nneonneo well... not really constant: https://wiki.python.org/moin/TimeComplexity – Dunno Mar 01 '14 at 21:52
  • @Dunno: Amortized/expected constant-time. (I was going to gloss over the "amortized" bit because it's not relevant to the comparison) – nneonneo Mar 01 '14 at 23:47
1

You should try

result = [(item, count(item)) for item in set(the_list)]    

as your code is basically counting the number of apparition of one word in your list.

See this SO question --> how to optimally count elements in a python list

Community
  • 1
  • 1
abrunet
  • 1,122
  • 17
  • 31