I am pretty new to coding and am currently struggling on how to optimize this code for larger lists.
import pandas as pd
import random
from time import time
rows = []
list1 = [random.randint(1, 100) for i in range(1_000_000)]
list2 = [random.randint(1, 100) for i in range(1_000_000)]
list3 = [random.randint(1, 100) for i in range(1_000_000)]
list4 = [random.randint(1, 100) for i in range(1_000_000)]
start = time()
for i in range(len(list1) - 1):
if list1[i] < list2[i] and list1[i + 1] > list2[i + 1]:
dict1 = {1: list1[i], 2: '+'}
rows.append(dict1)
elif list1[i] > list2[i] and list1[i + 1] < list2[i + 1]:
dict1 = {1: list1[i], 2: '-'}
rows.append(dict1)
if list3[i] < list4[i] and list3[i + 1] > list4[i + 1]:
dict1 = {1: list3[i], 2: '+'}
rows.append(dict1)
elif list3[i] > list4[i] and list3[i + 1] < list4[i + 1]:
dict1 = {1: list3[i], 2: '-'}
rows.append(dict1)
else:
dict1 = {1: list3[i], 2: '#'}
rows.append(dict1)
end = time()
print(end - start)
df = pd.DataFrame(rows)
with 10_000_000 entries it takes about 30 sec. It grows linear. Is there a way to optimize it for larger numbers?
I feel like the for-loop and the if-else statements are the biggest time consumers, but I can't figure out a way to optimize them.