0

I want to perform complex operation over 3 excels. Firstly, I have filtered data in which user status is Enabled and then I will loop through other 2 excels to find out data corresponding to the Enable Users. I started looping one by one and saving in a list of tuples. I have tried using Normal Looping and itertuples() both but still its too slow. Is there any faster way to perform this operation-

from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('Enable-DisableUsersList.xlsx')
df2 = df[df.Status == '*ENABLED']
df1 = pd.read_excel('Excel1.xlsx')
df3 = pd.read_excel('Excel.xls2')

final_list = []
for ind1 in df1.index:
    for ind2 in df2.index:
        if(df1['USER PROFILE'][ind1] == df2['User ID'][ind2]):
            for ind3 in df3.index:
                if(df3['user profile'][ind3] == df2['User ID'][ind2]):
                    userprofile = df1['USER PROFILE'][ind1]
                    nameprofile = df1['NAME PROFILE'][ind1]
                    mainmenuno = df3['sysmenu option'][ind3]
                    mainmenu = df1['SUBMENU'][ind1]
                    mainmenudesc = df1['NAME SUBMENU'][ind1]
                    submenu = df1['OPTION'][ind1]
                    submenudesc = df1['NAME OF OPTION'][ind1]
                    status = df2['Status'][ind2]

                    list = (userprofile, nameprofile, mainmenuno, mainmenu,
                            mainmenudesc, submenu, submenudesc, status)
                    final_list.append(list)
                    # print(final_list)


print(final_list)````

I have used df.iterrows() instead of df.index also but still thats slow.
Sajan
  • 1,247
  • 1
  • 5
  • 13
Ashish Kumar
  • 173
  • 2
  • 13
  • 1
    kindly share samples of each of the datasets, with expected output. data,not pics. [guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – sammywemmy Apr 13 '20 at 10:04
  • `itertuples()` is faster compared to using `iterrows()`; check this [link](https://medium.com/swlh/why-pandas-itertuples-is-faster-than-iterrows-and-how-to-make-it-even-faster-bc50c0edd30d) – tidakdiinginkan Apr 13 '20 at 10:08
  • I have used that too...Is there any other way? – Ashish Kumar Apr 13 '20 at 10:21

1 Answers1

0

use dictionaries

import pandas as pd
import random
import time

df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame()

id_list = list(range(500))
random.shuffle(id_list)
df1["id"] = id_list
random.shuffle(id_list)
df1["value"] = id_list
random.shuffle(id_list)
df2["id"] = id_list
random.shuffle(id_list)
df2["value"] = id_list
random.shuffle(id_list)
df3["id"] = id_list
random.shuffle(id_list)
df3["value"] = id_list
#till here we created 3 random dataframes with same values in different order
out = []
start = time.time()
for ind1 in df1.index:
    for ind2 in df2.index:
        if df2["value"][ind2] == df1["value"][ind1]:
            for ind3 in df3.index:
                if df3["value"][ind3]==df2["value"][ind2]:
                    out.append([df3["id"][ind3], df2["id"][ind2]])
print(time.time()-start)

start1 = time.time()
df1_dict = df1.set_index("value").to_dict() #use the value you want to compare as the index, it will make it the dict key
df2_dict = df2.set_index("value").to_dict()
df3_dict = df3.set_index("value").to_dict()

out1 = []
for k in df1_dict["id"]: 
    id2 = df2_dict["id"][k]
    id3 = df3_dict["id"][k]
    out1.append([id2,id3])

print(time.time()-start1)

for the iteration time is 8.5 seconds for the dictionaries its 0.0035

trigonom
  • 528
  • 4
  • 9