I have an issue with a simple for-loop. I'm trying to calculate maximum values from a list (i.e. list of shifting windows), create a list of those max values, which I'll later add to the data frame.
My data frame has two columns of float values, and datetime index. The data file has around 15 million rows (i.e. the length of the series that I want to iterate over is 15mln) (700 MB).
When I run my simple loop after some time my computer runs out of memory and crashes. I have 12 GB of RAM.
My code:
import pandas as pd
import numpy as np
# sample data
speed = np.random.uniform(0,25,15000000)
data_dict = {'speed': speed}
df = pd.DataFrame(data_dict)
# create a list of 'windows', i.e. subseries of the list
def GetShiftingWindows(thelist, size):
return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]
window_size = 10
list_of_win_speeds = GetShiftingWindows(df.speed, window_size)
list_of_max_speeds = []
for x in list_of_win_speeds:
max_value = max(x)
list_of_max_speeds.append(max_value)
I'm not a CS major. It seems to me like a space-complexity issue. What am I missing here to make it feasible to compute?