0

I am using a nested for loop to read selective data from a dataframe. And then I need to implement some mathematical formulae to that selective data. For this reason, I implemented a dynamic logic that separated out the index and column numbers of the data in a list "idx" and "cols". But the nested For Loop that I have applied to read this data is executing for unexpectedly greater number of times.

Following is the sample code and its output:

idx =  [1, 2]
cols = [2, 2]
count = 0

def run_imputation():
    global count
    for i in idx:
        for col in cols:
            count += 1
            print(count)
            dfClean.iloc[i, col] = tempOut[i,col]  #Need to do such and more computations
            origVal.append(dfClean_Orig.iloc[i, col])
            impuVal.append(dfClean.iloc[i, col])

%timeit run_imputation()


OUTPUT:
1
2
...... 
32444

So my question is that why is For loop executing 32444 times, whereas it should just execute 4 times. And is there any better way for doing such selective data computations as shown below other than such complicated For Loops in Python?

curiousBrain
  • 39
  • 1
  • 7
  • 2
    what does `idx` and `cols` contain at the end of the program ? – Anatole Sot Feb 03 '21 at 22:48
  • Please provide us with more context. Also, try printing `idx` and `cols` right after the function is called to see what they are storing. – Talendar Feb 03 '21 at 22:52
  • 4
    With 'timeit' module it will run your function multiple times. Since you're using global count, each iteration of your function with timeit will share same count variable and update on each iteration – Gopal Gautam Feb 03 '21 at 22:53
  • Can you provide more details on what you are trying to do? Always avoid using loops when dealing with pandas dataframe. You may find alternate solutions that can be done without for loops. – Joe Ferndz Feb 03 '21 at 22:54
  • After the function is called idx and cols remain unchanged: idx: [1, 2] cols: [2, 2] – curiousBrain Feb 03 '21 at 22:55
  • 1
    @GopalGautam nailed it. Try running your code without `%timeit` or initializing `count=0` inside the function and see the difference – G. Anderson Feb 03 '21 at 22:57
  • 1
    @GopalGautam Absolutely helpful. Indeed it does work correctly without '%timeit'. Thanks for pointing out the drawback of timeit. I never could have guessed it! – curiousBrain Feb 03 '21 at 23:10

3 Answers3

0

You have not shown your full codes. Thus, I only answer according to your coding style. I think that you run run_imputation() function multiple times. You could avoid seeing wrong number of iterations by not using any global variables.

For instance, you could try to change your coding style as follows:

idx =  [1, 2]
cols = [2, 2]

def run_imputation(idx, cols):
    count = 0
    for i in idx:
        for col in cols:
            count += 1
            print(count)
            dfClean.iloc[i, col] = tempOut[i,col]  # Need to do such and more computations
            origVal.append(dfClean_Orig.iloc[i, col])
            impuVal.append(dfClean.iloc[i, col])
abysslover
  • 683
  • 5
  • 14
0

For the first part of the question: It runs 32444 times instead of 4 times because you are using %timeit magic command together with you function call run_imputation().

You do not need to use %timeit if you do not want to measure the execution time of your function.

You can find more detail on the topic here: What is %timeit in python?

This answers the main question: "Python For Loop executing unexpected number of loops"

Second part of your question is not that clear for me. I can help with that as well if you can elaborate.

Serc
  • 106
  • 3
  • Yes it works if I dont use %timeit. But I also need to measure the time taken by the run_imputation function. May be I will have to use the timeit library for that. – curiousBrain Feb 03 '21 at 23:24
  • It is fine that you use %timeit if you want to measure the time taken. Yet, then you would also know that it is normal to see 32444 runs instead of 4 runs. So, it is the expected behavior. Please upvote the answer if you found it useful. – Serc Feb 03 '21 at 23:44
0

Instead of: %timeit run_imputation()

I used: run_imputation()

curiousBrain
  • 39
  • 1
  • 7