0

EDIT:

I'm trying to simulate online decision making process. In each iteration, I want to read a new line from a known data frame and make a decision according to it. Additionally, I want to save the last n rows of the dataframe that I used. Unfortunately, even iterating through the rows is very slow.

How can I do this better?

import pandas as pd
import numpy as np
import time

t0 = time.time()
s1 = np.random.randn(2000000)
s2 = np.random.randn(2000000)
time_series = pd.DataFrame({'s1': s1, 's2': s2})
n = time_series.shape[0]

for t in range(1, n - 1):

    curr_data = time_series.iloc[t]


print time.time() - t0

OLD VERSION:

I have a loop in which in every iteration I need to delete the first row of a dataframe, and append another row to the end. What would be the fastest method to use?

Roy
  • 837
  • 1
  • 9
  • 22
  • 2
    There are many. Slicing, `shift`, and so on. Can you please provide a [mcve] and explain what you've tried and why it hasn't worked? – cs95 Jan 03 '18 at 07:34
  • 1
    Also, what you've mentioned so far screams of a bad idea. If you could do a better job describing what you are doing, you are likely to get a suggestion that could improve your overall situation substantially. – piRSquared Jan 03 '18 at 07:35
  • @piRSquared, Thanks, I've edited the question – Roy Jan 03 '18 at 08:01
  • @Roy - `make a decision according to it. Additionally, I want to save the last n rows of the dataframe that I used` - do you want apply for each row some function? and save df to files? Can you explain more? – jezrael Jan 03 '18 at 08:08
  • @jezrael: Saving means that there would be an extra data frame (or another object) that would consist the last n rows that I've seen. The desicion process would be: given the new row, I would calculate its difference from the last row and use some regression model on this difference. – Roy Jan 03 '18 at 08:14
  • @Roy - not sure if exist some fast solution for it. – jezrael Jan 03 '18 at 08:33
  • @Roy - unfortunately there is problem avoid loops. And in soluion for each loop need previous output saved in last row + `regression model` and save to new `df` - all consume a lot of time... – jezrael Jan 03 '18 at 08:55
  • I still find it difficult to believe you want to loop through one at a time and print the row. That doesn't match up with you description. If iterating through the dataframe is what you want, I'd suggest [this](https://stackoverflow.com/q/16476924/2336654) – piRSquared Jan 03 '18 at 08:59

1 Answers1

1

If really need it is possible use:

for i in range(3):
    #remove first row
    df = df.iloc[1:]
    #e.g. append second row
    row = df.iloc[1]
    #append new row  
    df.loc[len(df.index)] = row

But if check this post it is slowiest solution:

6) updating an empty frame (e.g. using loc one-row-at-a-time)

So I guess here should be better/faster solutions. First step is avoid loops if possible.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252