Merging k sorted lists in Python3, problem with trade-off between memory and time

Question

The input is: The first line - a number of arrays (k); Each next line - the first number is the array size, next numbers are elements.

Max k is 1024. Max array size is 10*k. All numbers between 0 and 100. Memory limit - 10MB, time limit - 1s. Recommended complexity is k ⋅ log(k) ⋅ n, where n is an array length.

Example input:

4            
6 2 26 64 88 96 96
4 8 20 65 86
7 1 4 16 42 58 61 69
1 84

Example output:

1 2 4 8 16 20 26 42 58 61 64 65 69 84 86 88 96 96

I have 4 solutions. One uses heapq and reading input lines by blocks, one uses heapq, one uses Counter and one uses nothing.

This one uses heapq (good for time but bad for memory, I think heaps are right way, however maybe it can be optimized if I will read lines by parts, so that I won't need a memory for the whole input):

from heapq import merge


if __name__ == '__main__':
    print(*merge(*[[int(el) for el in input().split(' ')[1:]] for _ in range(int(input()))]), sep=' ')

This one is advanced version of the previous one. It reads lines by blocks, however it is very complex solution, I don't know how to optimize those reading:

from heapq import merge
from functools import reduce


def read_block(n, fd, cursors, offset, has_unused_items):
    MEMORY_LIMIT = 10240000
    block_size = MEMORY_LIMIT / n
    result = []

    for i in range(n):
        if has_unused_items[i]:
            if i == 0:
                fd.seek(cursors[i] + offset)
            else:
                fd.read(cursors[i])

            block = ''
            c = 0
            char = ''

            while c < block_size or char != ' ':
                if cursors[i] == 0:
                    while char != ' ':
                        char = fd.read(1)
                        cursors[i] += 1

                char = fd.read(1)

                if char != '\n':
                    block += char
                    cursors[i] += 1
                    c += 1
                else:
                    has_unused_items[i] = False
                    break

            result.append([int(i) for i in block.split(' ')])

            while char != '\n':
                char = fd.read(1)

    return result


def to_output(fd, iter):
    fd.write(' '.join([str(el) for el in iter]))


if __name__ == '__main__':
    with open('input.txt') as fd_input:
        with open('output.txt', 'w') as fd_output:
            n = int(fd_input.readline())
            offset = fd_input.tell()
            cursors = [0] * n
            has_unused_items = [True] * n
            result = []

            while reduce(lambda x, p: x or p, has_unused_items):
                result = merge(
                    result,
                    *read_block(n, fd_input, cursors, offset, has_unused_items)
                )

            to_output(fd_output, result)

This one is good for memory (using sorting with counter, but I didn't use the information that all arrays are sorted):

from collections import Counter


def solution():
    A = Counter()

    for _ in range(int(input())):
        A.update(input().split(' ')[1:])

    for k in sorted([int(el) for el in A]):
        for _ in range(A[str(k)]):
            yield k

This one is good for time (but maybe not enough good):

def solution():
    A = tuple(tuple(int(el) for el in input().split(' ')[1:]) for _ in range(int(input())) # input data
    c = [0] * len(A) # cursors for each array

    for i in range(101):
        for j, a in enumerate(A):
            for item in a[c[j]:]:
                if item == i:
                    yield i
                    c[j] += 1
                else:
                    break

Perfectly, if I would have arrays by parts in the first example, so that I won't need a memory for the whole input, but I don't know how to read lines by blocks correctly.

Could you please suggest something to solve the problem?

Just to be sure, you want this in pure python. Because you could really speed things up by using Cython. Also, I think you should not use `input` for performance. [here are some alternatives](https://stackoverflow.com/questions/1450393/how-do-you-read-from-stdin). Often the slowest thing in a program is I/O so make sure you optimise the right part of the code. — Benoît P, Feb 09 '19 at 10:34
@BenoîtPilatte, yes I need it in pure Python. Ok, thanks, I'll look into it. — shpindler, Feb 09 '19 at 10:36
One more thing. **NEVER** use `sorted` for performance, it doubles your memory consumption and increase time significantly, use `a.sort()` instead, it will modify a but return nothing, so `a` is now sorted. — Benoît P, Feb 09 '19 at 10:38
@BenoîtPilatte, yeah, it's true, I was a little bit unworried about the memory in the first example, though I have no problems with it there. — shpindler, Feb 09 '19 at 10:42
It takes time to allocate and copy an array too, so it isn't just about memory. — Benoît P, Feb 09 '19 at 10:44
@BenoîtPilatte, hm, agree, I tried to save keys in a variable and sort it, but it didn't give enough benefit in time. The problem in the algorithm, I didn't use the information that all arrays are sorted. — shpindler, Feb 09 '19 at 10:51
`from heapq import merge; merge(iterable1, iterable2, ...)` and probably use a generator to transform between the strings and numeric values. — alkasm, Feb 09 '19 at 11:01

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

O Deep Thought computer, what is the answer to life the universe and everything

Here is the code I used for the tests

"""4
6 2 26 64 88 96 96
4 8 20 65 86
7 1 4 16 42 58 61 69
1 84"""

from heapq import merge
from io import StringIO
from timeit import timeit

def solution():
    pass

times = []
for i in range(5000):
    f = StringIO(__doc__)
    times.append(timeit(solution, number=1))

print(min(times))

And here are the results, I tested solutions proposed in the comments:

6.5e-06 sec

def solution():
    A = []
    A = merge(A, *((int(i)
                    for i in line.split(' ')[1:])
                    for line in f.readlines()))
    return A

7.1e-06 sec

def solution():
    A = []
    for _ in range(int(f.readline())):
        A = merge(A, (int(i) for i in f.readline().split(' ')[1:]))
    return A

7.9e-07 sec

def solution():
    A = Counter()
    for _ in range(int(f.readline())):
        A.update(f.readline().split(' ')[1:])
    for k in sorted([int(el) for el in A]):
        for _ in range(A[str(k)]):
            yield k

8.3e-06 sec

def solution():
    A = []
    for _ in range(int(f.readline())):
        for i in f.readline().split(' ')[1:]:
            insort(A, i)
    return A

6.2e-07 sec

def solution():
    A = Counter()
    for _ in range(int(f.readline())):
        A.update(f.readline().split(' ')[1:])
    l = [int(el) for el in A]
    l.sort()
    for k in l:
        for _ in range(A[str(k)]):
            yield k

Your code is great, don't use sorted (impact becomes more significant with bigger arrays). You should test it with bigger inputs (I used what you gave).

This is with only the winners of the previous one (plus solution 6 which is the second one you gave). It appears that the speed limit is given by the I/O of the program and not the sorting itself.

Note that I generate squares (number of line == numbers per line)

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 09 '19 at 11:58

Benoît P

3,179
13
31

I gave a general example of input - output. The problem raises when I send the code to testing system. Unfortunately, I don't know their input. – shpindler Feb 09 '19 at 12:01
Do you know how to read lines by parts? For instance in my example input read only two items from each line, merge them and do it again while lines still have unused items? – shpindler Feb 09 '19 at 12:36
@shpindler I have a solution, but it depends on input being a file, not stdin. Where are you running the program? open.kattis.com? My solution maintains positions in the file for each array - however, it appears you cannot seek in stdin as it is a pipe. – Jeppe Feb 09 '19 at 13:36
@Jeppe It's ok, file is suitable too. I'm running it in a special contest. Could you please write it as an answer? – shpindler Feb 09 '19 at 13:38
You could read char by char but [for mysterious reasons](https://stackoverflow.com/questions/41625529/why-is-reading-one-byte-20x-slower-than-reading-2-3-4-bytes-from-a-file) it might be much much slower – Benoît P Feb 09 '19 at 14:01
@BenoîtPilatte yes, but I can define a block size, I consider that it should be MEMORY_LIMIT / k. But then there is a problem if I read only part of number, for example 10 instead of 100. – shpindler Feb 09 '19 at 14:39
Ok, it's really easy. But another problem is do not read after the end of line. In this case maybe the only way is to read by 1 byte, which can be slow. – shpindler Feb 09 '19 at 14:56
And the main problem is that I need to know the line length in order to move cursor to the next one. So that I lose all benefits from reading by blocks... But there are must be a method to do it correctly. – shpindler Feb 09 '19 at 15:03
Loading the full line in RAM is more efficient that making hundreds of memory calls for each chars. – Benoît P Feb 09 '19 at 15:05
Currently the main problem is memory. I added solution with reading by blocks however it is absolutely time complex. – shpindler Feb 09 '19 at 15:49
You were right. The problem was in my I/O method. Thanks. Counter method which uses files but not stdin/stdout passes all the tests. – shpindler Feb 10 '19 at 08:55

score 1 · Answer 2 · answered Feb 11 '19 at 11:46

If the lines of integers are already sorted then you only need to focus on how to stitch the pieces together.

In order to achieve this my solution tracks the state of the problem in a list of tuples.

Each tuple records the offset of the line, num_elements which is the number of elements in the line still to be processed, next_elem is the value of the next element to be processed, last_elem is the value of the last element in the line.

The algorithm loops through the list of state tuples which are sorted based upon the value of next_elem and last_elem appending the next lowest values to the A list. The state is updated and the list sorted, rinse and repeat until the list is empty.

I'd be curious to see how this performs relative to the other solutions.

from operator import itemgetter

def solution():
    state = []
    A = []
    k = int(f.readline())
    for _ in range(k):
        offset = f.tell()
        line = f.readline().split()
        # Store the state data for processing each line in a tuple
        # Append tuple to the state list: (offset, num_elements, next_elem, last_elem)
        state.append((offset, int(line[0]), int(line[1]), int(line[-1])))
    # Sort the list of stat tuples by value of next and last elements
    state.sort(key=itemgetter(2, 3))
    # [
    #    (34, 7, 1, 69),
    #    (2, 6, 2, 96),
    #    (21, 4, 8, 86),
    #    (55, 1, 84, 84)
    # ]
    while (len(state) > 0):
        offset, num_elements, _, last = state[0]
        _ = f.seek(offset)
        line = f.readline().split()
        if ((len(state) == 1) or (last <= state[1][2])):
            # Add the remaining line elements to the `result`
            A += line[-(num_elements):]
            # Delete the line from state
            del state[0]
        else:
            while (int(line[-(num_elements)]) <= state[1][2]):
                # Append the element to the `result`
                A.append(line[-(num_elements)])
                # Decrement the number of elements in the line to be processed
                num_elements -= 1
            if (num_elements > 0):
                # Update the tuple
                state[0] = (offset, (num_elements), int(
                    line[-(num_elements)]), int(line[-1]))
                # Sort the list of tuples
                state.sort(key=itemgetter(2, 3))
            else:
                # Delete the depleted line from state
                del state[0]
    # Return the result
    return A

Interesting, ok. Currently, the contest is unavailable. When it'll become possible, I'll have tested your solution and wrote result here. — shpindler, Feb 12 '19 at 14:45
It's the contest URL: https://contest.yandex.ru/contest/8458/enter/?lang=en , this task is the latest. — shpindler, Feb 13 '19 at 11:32