I'm trying to merge the columns of two dataframes based on some conditions

Question

I'm sorry, I know this question has already been asked, but I haven't managed to optimize/correct my results using the existing help.

I have two pandas dataframe :

df_1 : Contains the whole information bank about switches from 'a' to 'z'

Index localHost  localInt vlans
 0    switch_a   Te1/0/1  100,200,300
 1    switch_a   Te1/0/2  252,478,963
 2    switch_b   Te1/0/1  154,586,354
...

df_2 : Miss vlans information for some specific switches (all located in df_1 but not in the same order)

Index localHost  localInt vlans
 0    switch_a   Te1/0/1  
 1    switch_a   Te1/0/2  
 2    switch_z   Te1/0/1  
...

I would like to update df_2 or create a new dataframe (doesn't matter much) that will merge the df_1 and df_2 vlans data.

The only condition for the merged is that the localHost and localInt value in the same row are equal in both dataframes when reporting the vlans.

I first tried using a mask to isolate the only relevant rows :

for index, row in df_1.iterrows() :
  switch_name = row['localHost']
  switch_interface = row['localInt']
  switch_vlans = row['vlans']
    
  mask = (df_2['localHost'] == switch_name) & (df_2['localInt'] == switch_interface)
  df_2.loc[mask, 'vlans'] = switch_vlans

It seems to be working, but I don't understand why only every other line is processed:

Index localHost  localInt vlans
 0    switch_a   Te1/0/1  100,200,300
 1    switch_a   Te1/0/2  
 2    switch_z   Te1/0/1  265,477,111
...

Then I tried using a nested loop but the results was the same. do you have any idea what I'm doing wrong? Thanks.

I am not sure what you want. Could you please include the desired output so I (or someone else) can try to generate it? — luanpo1234, Jul 03 '23 at 16:40

score 1 · Accepted Answer · answered Jul 03 '23 at 16:46

If I understand you correctly you should use .merge:

out = df2.drop(columns="vlans").merge(
    df1[["localHost", "localInt", "vlans"]], on=["localHost", "localInt"], how="left"
)
print(out)

Prints:

   Index localHost localInt        vlans
0      0  switch_a  Te1/0/1  100,200,300
1      1  switch_a  Te1/0/2  252,478,963
2      2  switch_z  Te1/0/1  265,477,111

Initial dataframe df1:

   Index localHost localInt        vlans
0      0  switch_a  Te1/0/1  100,200,300
1      1  switch_a  Te1/0/2  252,478,963
2      2  switch_b  Te1/0/1  154,586,354
3      3  switch_z  Te1/0/1  265,477,111

Initial dataframe df2:

   Index localHost localInt  vlans
0      0  switch_a  Te1/0/1    NaN
1      1  switch_a  Te1/0/2    NaN
2      2  switch_z  Te1/0/1    NaN

score -2 · Answer 2 · answered Jul 03 '23 at 16:50

The issue you're facing is due to the fact that you're iterating over "df_1" and updating "df_2" within the loop. This can cause unexpected behavior because the indices of "df_2" are not necessarily sequential, and updating values within the loop can lead to misalignment.

To resolve this, you can use the merge function provided by pandas to merge the two dataframes based on the conditions you mentioned. Here's an example:

import pandas as pd

# Sample dataframes
df_1 = pd.DataFrame({
    'localHost': ['switch_a', 'switch_a', 'switch_b'],
    'localInt': ['Te1/0/1', 'Te1/0/2', 'Te1/0/1'],
    'vlans': ['100,200,300', '252,478,963', '154,586,354']
})

df_2 = pd.DataFrame({
    'localHost': ['switch_a', 'switch_a', 'switch_z'],
    'localInt': ['Te1/0/1', 'Te1/0/2', 'Te1/0/1'],
    'vlans': ['', '', '']
})

# Merge df_1 and df_2 based on localHost and localInt
merged_df = pd.merge(df_2, df_1[['localHost', 'localInt', 'vlans']], on=['localHost', 'localInt'], how='left')

# Update vlans in df_2 with the values from merged_df
df_2['vlans'] = merged_df['vlans_y'].fillna(df_2['vlans'])

# Alternatively, create a new dataframe with the updated vlans
new_df = df_2.copy()
new_df['vlans'] = merged_df['vlans_y'].fillna(df_2['vlans'])

# Print the updated df_2
print("Updated df_2:")
print(df_2)

# Print the new dataframe
print("\nNew dataframe:")
print(new_df)

This answer looks like it was generated by an AI (like ChatGPT), not by an actual human being. You should be aware that [posting AI-generated output is officially **BANNED** on Stack Overflow](https://meta.stackoverflow.com/q/421831). If this answer was indeed generated by an AI, then I strongly suggest you delete it before you get yourself into even bigger trouble: **WE TAKE PLAGIARISM SERIOUSLY HERE.** Please read: [Why posting GPT and ChatGPT generated answers is not currently acceptable](https://stackoverflow.com/help/gpt-policy). — tchrist, Jul 04 '23 at 19:31
I did the code and validated the output in colab. Output was fine. You can validate the code — Python-Turtle, Jul 05 '23 at 05:10

I'm trying to merge the columns of two dataframes based on some conditions

2 Answers2