0

RESOLUTION: Seems qiskit does something weird when you re-use circuits that means running them takes extra time. The solution was to pull

for idx, image in enumerate(this_data):
    np.random.seed(42)
    quantum_circuits = [generate_random_circuit(depth=10,num_qubits=4,prob_appl_single=0.3,prob_appl_multi=0.7) for _ in range(2)]

into the for loop, for anyone interested.

I've seen Why simple Python program slows down as time progresses? and Python Script slowing down as it progresses? and as you'll see I've tried to implement all of the learning I could from there but that hasn't helped.

I'm trying to perform some computation to an array of data but it gets significantly slower as time progresses. Specifically

from quantum_pipe import generate_random_circuit, conv
import pandas as pd
import numpy as np
import gc


train_data = pd.read_csv('./fashion-mnist/fashion-mnist_train.csv')
this_data = train_data[args.start_idx:args.start_idx+args.num_datapoints].drop(['label'],axis=1)
this_data = (this_data / 255).round().astype(np.uint8).values
train_data = None # this is just for garbage collector to deal with

quantum_circuits = [generate_random_circuit(depth=10,num_qubits=4,prob_appl_single=0.3,prob_appl_multi=0.7) for _ in range(12)]

img_outputs = []

for idx, image in enumerate(this_data):
    image = image.reshape((28,28))
    outputs = [conv(qc, 2, image) for qc in quantum_circuits]
    print(f"IMAGE COMPLETED: {idx+1} of {args.num_datapoints}")
    img_outputs.append(outputs)
    gc.collect()

np.save(f'./quantum_data/start{args.start_idx}to{args.start_idx+args.num_datapoints}.npy',img_outputs)

and my timing is looking at each time it completes an element of the list comprehension in output. My conv function is defined as

def conv(qc, filter_size, image, mode='threshold'):
    ''' Write the loops to slide our 'filter' over our image '''
    # here filter doesn't actually matter, we just use the flattened binary list as our init
    # might as well hard-code 3x3 filters, can happily handle 2^9 = 512 states
    start = time.time()
    prepped_img = prepare_img(filter_size, image)
    print(prepped_img.shape)
    img_height, img_width = prepped_img.shape
    conv_output = np.zeros(image.shape)
    for down_idx in range(img_height - (filter_size-1)):
        for across_idx in range(img_width  - (filter_size-1)):
            section = prepped_img[down_idx:down_idx + filter_size, across_idx: across_idx + filter_size]
            init_arr = encoding_function(section,mode)
            qc.initialize(init_arr, qc.qubits)
            job = execute(qc, BACKEND, shots=500)
            results = job.result()
            counts = results.get_counts(qc)
            output = np.zeros(len(init_arr))
            for key, value in counts.items():
                keyidx = int(key,2)
                output[keyidx] = value
            output = output/ np.sqrt(np.sum(output**2))
            entropy = shannon_entropy(output)
            conv_output[down_idx,across_idx] = entropy
    print(f"filter completed in {time.time()-start} s")
    return conv_output

Looking at my terminal output I see something like

filter completed in 5.358206748962402 s
(15, 15)
filter completed in 5.510828733444214 s
(15, 15)
filter completed in 5.9010186195373535 s
(15, 15)
filter completed in 5.649052143096924 s
(15, 15)
filter completed in 5.400753021240234 s
(15, 15)
filter completed in 6.231165647506714 s
(15, 15)
filter completed in 6.202470779418945 s
(15, 15)
filter completed in 5.346660852432251 s
(15, 15)
filter completed in 5.406870365142822 s
(15, 15)
filter completed in 5.373116731643677 s
(15, 15)
filter completed in 6.068220853805542 s
IMAGE COMPLETED: 1 of 150
(15, 15)
filter completed in 9.925729036331177 s
(15, 15)
filter completed in 10.407076597213745 s
(15, 15)
filter completed in 10.152106523513794 s
(15, 15)
filter completed in 10.058736801147461 s
(15, 15)
filter completed in 10.039350509643555 s
(15, 15)
filter completed in 10.171052694320679 s
(15, 15)
filter completed in 10.037626266479492 s
(15, 15)
filter completed in 9.979500770568848 s
(15, 15)
filter completed in 10.973601579666138 s
(15, 15)
filter completed in 10.002075433731079 s
(15, 15)
filter completed in 9.934457540512085 s
(15, 15)
filter completed in 10.39132308959961 s
IMAGE COMPLETED: 2 of 150
(15, 15)
filter completed in 14.923802375793457 s
(15, 15)
filter completed in 14.426809549331665 s
(15, 15)
filter completed in 14.47310185432434 s
(15, 15)
filter completed in 14.773804187774658 s
(15, 15)
filter completed in 14.724096059799194 s
(15, 15)
filter completed in 14.462066650390625 s
(15, 15)
filter completed in 14.38855791091919 s
(15, 15)
filter completed in 14.864516735076904 s
(15, 15)
filter completed in 16.536264657974243 s
(15, 15)
filter completed in 15.544280052185059 s
(15, 15)
filter completed in 14.404176235198975 s
(15, 15)
filter completed in 14.43254804611206 s
IMAGE COMPLETED: 3 of 150
(15, 15)
filter completed in 18.811672687530518 s
(15, 15)
filter completed in 19.01636004447937 s
(15, 15)
filter completed in 19.908296823501587 s
(15, 15)
filter completed in 21.753503561019897 s
(15, 15)
filter completed in 20.8498797416687 s
(15, 15)
filter completed in 20.544127941131592 s
(15, 15)
filter completed in 20.662254333496094 s
(15, 15)
filter completed in 20.65923762321472 s
(15, 15)
filter completed in 19.335585355758667 s
(15, 15)
filter completed in 19.945393323898315 s
(15, 15)
filter completed in 20.45564317703247 s
(15, 15)
filter completed in 20.289149522781372 s
IMAGE COMPLETED: 4 of 150
(15, 15)
filter completed in 25.40975260734558 s
(15, 15)
filter completed in 24.785842418670654 s
(15, 15)
filter completed in 24.025460720062256 s
(15, 15)
filter completed in 23.852452516555786 s
(15, 15)
filter completed in 24.38658046722412 s
(15, 15)
filter completed in 23.814111948013306 s

This shows me that each filter for the same image takes about the same time, but every subsequent image takes longer and longer, usually by a constant amount. Seeing as I need to do this for many images, this isn't sustainable for me, and on the uni machine, it went from 3s a filter to 70s a filter.

My repo is here https://github.com/milanleonard/QNN-implementation if there's any info I'm missing. I cannot imagine why this is slowing down.

For some extra info, it seems to be the STEP in the loop that is causing the problem, since reducing the amount of quantum_circuits, i.e. number of filters per image, doesn't help at all. Refactoring to save at each step didn't help, and my memory isn't growing quickly anywhere.

We thought that it was maybe the append to img_outputs that was the problem, but this refactor shows it wasn't.

for idx, image in enumerate(this_data):
    image = image.reshape((28,28))
    image = prepare_img(2,image)
    outputs = [conv(qc, 2, image) for qc in quantum_circuits]
    print(f"IMAGE COMPLETED: {idx+1} of {args.num_datapoints}")
    np.save(f'quantum_data/img{args.start_idx + idx}.npy',outputs)
    outputs, image = None, None

After some profiling as suggested by Michael, I see this

filter completed in 16.571566104888916 s
filter completed in 16.666788339614868 s
         15378121 function calls (15376161 primitive calls) in 33.240 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      788    4.492    0.006    4.492    0.006 {method 'acquire' of '_thread.lock' objects}
    60368    3.586    0.000    9.891    0.000 dagcircuit.py:292(apply_operation_back)
   241472    1.336    0.000    3.593    0.000 copy.py:66(copy)
  1576240    1.166    0.000    1.188    0.000 {built-in method builtins.isinstance}
   930412    0.674    0.000    0.674    0.000 bit.py:75(__hash__)
    60368    0.672    0.000    1.220    0.000 assemble_circuits.py:71(<listcomp>)
   244216    0.659    0.000    2.253    0.000 {built-in method builtins.all}
    60760    0.611    0.000    1.113    0.000 quantumcircuit.py:581(_update_parameter_tabl

2ND RUN

filter completed in 36.4175808429718 s
filter completed in 36.10355544090271 s
         36660244 function calls (36658284 primitive calls) in 72.523 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   137200    9.124    0.000   25.409    0.000 dagcircuit.py:292(apply_operation_back)
      784    7.039    0.009    7.039    0.009 {method 'acquire' of '_thread.lock' objects}
   548800    2.966    0.000    8.067    0.000 copy.py:66(copy)
  3958024    2.815    0.000    2.836    0.000 {built-in method builtins.isinstance}
  2467052    1.752    0.000    1.752    0.000 bit.py:75(__hash__)
   137592    1.650    0.000    3.057    0.000 quantumcircuit.py:581(_update_parameter_table)
   551544    1.539    0.000    5.519    0.000 {built-in method builtins.all}
   984312    1.509    0.000    2.749    0.000 retworkx_dagcircuit.py:62(_add_multi_graph_edge)
   137200    1.495    0.000    2.727    0.000 assemble_circuits.py:71(<listcomp>)
   489020    1.445    0.000    2.162    0.000 register.py:98(__repr__)

Looks like the number of calls to ApplyOperationBack grows linearly with iterations of the loop

   60368    3.497    0.000    9.678    0.000 dagcircuit.py:292(apply_operation_back)
   137200    8.993    0.000   25.163    0.000 dagcircuit.py:292(apply_operation_back)
   214032   15.435    0.000   41.346    0.000 dagcircuit.py:292(apply_operation_back)
   290864   20.282    0.000   54.842    0.000 dagcircuit.py:292(apply_operation_back)

Guess I should go hunting for why this is growing.

Milan
  • 344
  • 1
  • 10
  • 1
    Consider monitoring the process's memory usage to see if it is expanding as time goes on. – Mark Setchell May 21 '20 at 07:54
  • Thanks @MarkSetchell, memory usage does expand as expected (due to expanding img_outputs), but expands incredibly slowly. I refactored to save at the end of each image so I didn't have to store a growing img_outputs list, but this didn't benefit me – Milan May 21 '20 at 07:59
  • @MichaelVeksler gc.collect() was actually introduced to try and combat this problem based off of suggestions of the previous posts. I don't need it at all, and testing it just then showed it had quite literally zero impact. – Milan May 21 '20 at 08:25
  • 1
    Nothing comes to mind in this code. You'd have to profile it, so you can find the line(s) that take most of the time. It's hard to say form your code right now. As @MarkSetchell suggested, memory usage could cause slow down, because excessive RAM usage has impact on cache (more cache misses) and the memory management unit (excessive paging and thrashing). The easiest way to check this assumption, is to test performance without appending to `img_outputs` – Michael Veksler May 21 '20 at 08:31
  • To veer a little off track, I have no experience with profiling, any tips/packages you'd suggest. I dropped appending to img_outputs and it didn't add any benefit, I added the refactor in the original post – Milan May 21 '20 at 08:33
  • https://docs.python.org/3/library/profile.html - you simply `import cProfile`, put your main loop (with `enumerate(this_data)`) into a function, and then run your function `cProfile.run('your_function()')`. – Michael Veksler May 21 '20 at 09:13
  • Thanks @MichaelVeksler , looks like its a qiskit issue perhaps. I've added the profile trace to the main body of the question if your extra experience brings any intuitions to the table. – Milan May 21 '20 at 09:21
  • 1
    Your profiler output is difficult to follow. You need to figure out who calls apply_operation_back, and why it does it so ofter. Viewing the call-graph may help. Have a look at https://stackoverflow.com/a/23164271/4955498 – Michael Veksler May 21 '20 at 09:48

0 Answers0