1

I am trying to run a code on a large dataset, and optimizing the code in any way could greatly help.

The following is a dummy code of what I am doing:

output = []
for i in my_list:
    for index,row in df.iterrows():

        # required in output
        c1 = []
        c2 = []
        output_row1 = []
        output_row2 = []

        # data from datframe df
        var1 = row.Var1
        var2 = row.Var2

        # data from dictionaries
        for j in my_dict1[i].col1:
            output_row1.append(data_dict[j+":"+i+":"+var1+":"+var2])
            c1.append(-1)
        for j in my_dict2[i].col2:
            output_row2.append(data_dict[i+":"+j+":"+var1+":"+var2])
            c2.append(1)

        # Final output
        output.append([output_row1 + output_row2, c1 + c2])

For each element in my_list, and for each row in dataframe df, I want to add an element in output, the data for which is obtained from 3 separate dictionaries, my_dict1, my_dict2 and data_dict

Could anyone help me in terms of suggesting any better ways to store the data, or any latest libraries of python which might solve this faster. Thanks in advance.

Edited Code:

import pandas as pd

my_list = ["Node1","Node2","Node3","Node4"]

df = pd.DataFrame({"Shipments":[1,2], 
                   "Origin":["Node1","Node2"], 
                   "Destination":["Node3","Node4"]})

my_dict1 = {"Node1":[], 
            "Node2":["Node1","Node3"], 
            "Node3":[], 
            "Node4":["Node2", "Node3"]}

my_dict2 = {"Node1":["Node2"],
            "Node2":["Node4"], 
            "Node3":["Node2", "Node4"],
            "Node4":[]}

data_dict = {"Node1:Node2:Node1:Node3":5,
             "Node1:Node2:Node2:Node4":5,
             "Node3:Node2:Node1:Node3":4,
             "Node3:Node2:Node2:Node4":4,
             "Node2:Node4:Node1:Node3":3,
             "Node2:Node4:Node2:Node4":3,
             "Node3:Node4:Node1:Node3":8,
             "Node3:Node4:Node2:Node4":8}

output = []
for i in my_list:
    for index,row in df.iterrows():

        # required in output
        c1 = []
        c2 = []
        output_row1 = []
        output_row2 = []

        # data from datframe df
        var1 = row.Origin
        var2 = row.Destination

        # data from dictionaries
        for j in my_dict1[i]:
            output_row1.append(data_dict[j+":"+i+":"+var1+":"+var2])
            c1.append(-1)
        for j in my_dict2[i]:
            output_row2.append(data_dict[i+":"+j+":"+var1+":"+var2])
            c2.append(1)

        # Final output
        output.append([output_row1 + output_row2, c1 + c2])
Rohith
  • 1,008
  • 3
  • 8
  • 19
  • 3
    Can you [edit] your question to include some example input and output? – Mr. Llama Jul 30 '18 at 16:38
  • Possible duplicate of [How to replace dataframe column values with dictionary keys?](https://stackoverflow.com/questions/45787481/how-to-replace-dataframe-column-values-with-dictionary-keys) – Rushabh Mehta Jul 30 '18 at 16:39
  • @Mr.Llama I could do that. I need some time to create the dummy data though. – Rohith Jul 30 '18 at 16:40
  • 1
    Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. SO citizens rarely engage in desk-checking code. – Prune Jul 30 '18 at 16:41
  • Are lists `c1`, `c2`, `output_row1`, and `output_row2` necessary? These are going to end up being large lists if `my_list` and `df` are large. If yes, using numpy for some of these should help if they are the same data types. – busybear Jul 30 '18 at 16:45
  • The lists are themselves not necessary, but I couldn't find a better way to append that to `output`. I will try with numpy and check though. Thanks – Rohith Jul 30 '18 at 16:47
  • I just realized what you are doing with `output`. But as already mentioned, MVCE code would be useful. – busybear Jul 30 '18 at 18:04

0 Answers0