0

I would like to optimize a code that is using two for().

I have the following dataframes:

      import pandas as pd
      import numpy as np

      df_Original = pd.DataFrame({'System Model': ['System 100', 'System 101', 'System 108', 'System 
                                                  200'], 
                                  'ID Sensor': [54, 55, 75, 100],
                                  'Sensor Type': ['Analog', 'Digital', 'Analog', 'Digital']})


      df_Second = pd.DataFrame({'ID Sensor': [54, 2, 55, 100], 
                                'Sensor_Max': [1024, 1, 1,1],
                                'Sensor_Min': [0, 0, 0, 0]})

I would need to create a new column in df_Second with the indication of which 'System Model' the 'ID Sensor' belongs to. So, I implemented the following code:

      # Boot
      df_Second['new_columns_System_Model'] = np.NaN
      # Iterative
      for i in range(0, len(df_Original)):
          for j in range(0, len(df_Second)):
              # Condition
              if(df_Original['ID Sensor'].iloc[i] == df_Second['ID Sensor'].iloc[j]):
                  # New column
                  df_Second['new_columns_System_Model'].iloc[j] = df_Original['System Model'].iloc[i]

The code is working perfectly. The output is being as desired:

        ID Sensor   Sensor_Max  Sensor_Min  new_columns_System_Model
           54           1024        0             System 100
           2             1          0                 NaN
           55            1          0             System 101
          100            1          0             System 200

However, if the dataframes are larger, this code will take a long time iterating between the two for (). I ask for help to make the code more efficient. Thank you.

Jane Borges
  • 552
  • 5
  • 14
  • 1
    Sounds like you want the functionality of [join](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html) – Zionsof May 18 '20 at 12:55
  • 1
    you need `pd.merge(df_Second,df_Original[['ID Sensor','System Model']],left_on=['ID Sensor'],right_on=['ID Sensor'],how='left')` – Umar.H May 18 '20 at 13:05
  • @Zionsof : Worked perfectly. I applied it as follows: df_Second.join(df_Original.set_index('ID Sensor'), on='ID Sensor') – Jane Borges May 18 '20 at 13:53
  • @Datanovice : The merge also works. I found the most suitable merge for the problem in question. Tks – Jane Borges May 18 '20 at 13:54
  • A note for anyone who has a similar problem is: Check that the 'ID Sensor' columns are of the same type, before applying the merge (). – Jane Borges May 18 '20 at 13:56
  • no worries @JaneBorges you should mark the question as a duplicate as the marked answer has a lot of info – Umar.H May 18 '20 at 14:00
  • @Datanovive I noticed a problem in the merge (): in the original dataframe (it has 86,000 rows) when I do the merge I would just like to fill in the new column. However, the dataframe has 560,800 lines. Do you know how to solve this? – Jane Borges May 19 '20 at 12:11

0 Answers0