python - Reduce time consumption

Question

I was writing code for this problem:

Given an array A having N elements. Find total number of pairs (i,j) such that j < i and Aj = Ai.

This is my code:

raw_input()

l = list(map(int, raw_input().split()))

count = 0

for a in range(len(l)):
    count = count + l[a+1:].count(l[a])

print(count)

But unfortunately the code is taking a lot of time. Do you have any suggestions by which I could reduce the time consumption? I mean, how do I reduce the time consumed in the for loop. I feel that the list.count method takes a lot of time so do you have any ideas by which I could replace it.

Yes, `list.count()` has to iterate over the *whole list* to count the number of occurrences. — Martijn Pieters, Apr 04 '16 at 12:32
Try and think of other ways to only traverse the list *once*. Perhaps you can track if you already have seen a number before? — Martijn Pieters, Apr 04 '16 at 12:34
@MartijnPieters I earlier had `sorted` the list by mistake and when I came to know of it I realized that it didn't have any effect on the answers which it could calculate. — Some Name, Apr 04 '16 at 12:39
I'd create a *count* of all the values (`collections.Counter()` or a `defaultdict`), then calculate the [number of combinations](https://en.wikipedia.org/wiki/Combination) possible with any count over 1; so for any counter > 1, add n over 2; [Statistics: combinations in Python](http://stackoverflow.com/q/3025162) — Martijn Pieters, Apr 04 '16 at 12:49
Are you sure this is `python-3.x`? You're using `raw_input` which is python2 and is called `input` in python3 — Daenyth, Apr 04 '16 at 12:53

score 1 · Answer 1 · answered Apr 04 '16 at 12:50

1

You can speed it up by using a faster way of checking membership than .count(). For instance, a dict lookup is extremely fast:

from collections import defaultdict

raw_input()

l = list(map(int, raw_input().split()))

keys = defaultdict(list)

for i, v in enumerate(l):
    keys[v].append(i)

for value, keys in keys.items():
    print("Value %d appears at indices %s." % (k, ", ".join(keys)))

Then you just need to count the number of pairs.

answered Apr 04 '16 at 12:50

RoadieRich

6,330
3
35
52

... which is just a (len(keys[k]) over 2) calculation for each len(keys[k]) > 1. You may as well just calculate the length, no point in storing all the indices. – Martijn Pieters Apr 04 '16 at 12:55

Some Name · Answer 2 · 2016-04-04T12:57:08.393

1

I finally got the answer. It reduced the time consumption to a very low amount. Sorry for disturbances caused.

raw_input()

l = list(map(int, raw_input().split()))

dictionary = {}

for value in l:
    if value in dictionary:
        dictionary[value] += 1
    else:
        dictionary[value] = 0

def sum_till_n(iterable):
    return [x*(x+1)/2 for x in iterable]

print(sum(sum_till_n(dictionary.values())))

I hope you understand what the code does. I looked at the problem in a mathematical way. The dictionary dictionary stores the number of value's after the first value. The sum_till_n function is a function which finds the sum of numbers till n. Like for n=3, it returns 1+2+3.

edited Apr 04 '16 at 12:57

answered Apr 04 '16 at 12:51

Some Name

307
2
11

Take a look at [`collections.defaultdict`](https://docs.python.org/3.5/library/collections.html#collections.defaultdict),specifically `defaultdict(int)` to tidy up the `if`/`else` code. – RoadieRich Apr 04 '16 at 12:53
If you're on Python 2, `if value in dictionary.keys():` is incredibly inefficient; it makes a `list` copy of the dict's keys, then searches it linearly. Just use `if value in dictionary:` which does a `O(1)` lookup in the `dict` itself, with no copies. In general, `.keys`/`.items`/`.values` are all bad ideas in Python 2; use their view based equivalents if you need them (`viewkeys`/`viewitems`/`viewvalues`) to avoid making copies, but usually you avoid `viewkeys` entirely unless you really need `set`-like operations (the `dict` itself behaves like `viewkeys` otherwise). – ShadowRanger Apr 04 '16 at 12:55
Not sure why this was downvoted; the OP solved this themselves, they are perfectly in their rights to post an answer too. And this answer happens to use the correct approach (although I wonder if you should skip `x < 2` in `sum_till_n()`). – Martijn Pieters Apr 04 '16 at 14:07

score 0 · Answer 3 · edited May 23 '17 at 12:07

0

Micro-optimisations in your python code won't reduce your current O(n^2) complexity by a significant amount. I would suggest having a look at implementations that yield lower complexity. This link will help you find total number of (i,j) pairs in array such that i<j and a[i]>a[j]

edited May 23 '17 at 12:07

Community

1
1

answered Apr 04 '16 at 12:48

thikonom

4,219
3
26
30

score 0 · Answer 4 · answered Apr 04 '16 at 13:04

0

Let's take a minute have a look at the problem and not at your implementation!

The task is:

Given an array A having N elements. Find total number of pairs (i,j) such that j < i and Aj = Ai.

Which is the same as finding all duplicates in your array!

Finding all duplicates is a very common task.

One solution I can quickly think of:

sort array of tuples(value, index) (O(N) = N*log(N))
traverse the array once to find consecutive duplicates

Here's an example how the resulting data would look like:

input = [2.0, 1.0, 3.0, 1.0]
# create and sort tuples (I will leave this task to you)
sorted_tuples = [(1.0, 1), (1.0, 3), (2.0, 0), (3.0, 2)]
# find consecutive duplicates (I will leave this task to you)
solution = [(3, 1)]

answered Apr 04 '16 at 13:04

Markus

592
3
13

It is not just finding all duplicates. It is finding the *number of pairs* you can create from those duplicates (the number of combinations). It can be achieved in O(N) time, no need to sort just to find pairs. – Martijn Pieters Apr 04 '16 at 13:49
I have been a bit imprecise. Ofc it's about the finding the index pairs. As soon as you have found all Objects with the same value, building all index pairs is trivial (and also complete nonsense). The naive solution is O(N^2) like bubble sort. @Martijn Pieters I doubt, that there's an algorithm with O(N), but please prove me wrong. – Markus Apr 04 '16 at 15:24
See SomeName's solution. It iterates over the input elements once (O(N)), building a dictionary of counts. Then another O(K) loop over the counts (where K < N) to calculate the combinations. – Martijn Pieters Apr 04 '16 at 15:27
Me culpa. I wasn't aware that dictionary lookup was O(1) on average. – Markus Apr 04 '16 at 15:36

aluriak · Answer 5 · 2016-04-04T12:44:57.480

-1

This is probably quicker, because use generator and slicing:

l = [1, 2, 1, 3]

count = 0
for idx, value in enumerate(l):
    count += sum(1 for v in l[:idx] if v == value)

print(count)
assert count == 1

edited Apr 04 '16 at 12:44

answered Apr 04 '16 at 12:41

aluriak

5,559
2
26
39

Don't you think you should replace `v < value` by `v == value` in your code? – Some Name Apr 04 '16 at 12:42
I think i misunderstood the challenge. EDIT: fixed. Should work now. – aluriak Apr 04 '16 at 12:43
`v` is the value of the item. It doesn't need to be less than `value`. It needs to be equal to it. – Some Name Apr 04 '16 at 12:44
2

Algorithmically, there is zero difference here, and it's likely to actually be slower because the `count` method operates at the C layer in CPython, while generator expressions are at the Python layer. – ShadowRanger Apr 04 '16 at 12:44

python - Reduce time consumption

5 Answers5