1

I have two dataframes:

Row No.    Subject    
1      Apple
2      Banana
3      Orange
4      Lemon
5      Strawberry


row_number Subjects Special?
1    Banana      Yes
2    Lemon       No
3    Apple       No
4    Orange      No
5    Strawberry  Yes
6    Cranberry   Yes
7    Watermelon  No

I want to change the Row No. of the first dataframe to match the second. It should be like this:

Row No.    Subject   
3      Apple
1      Banana
4      Orange
2      Lemon
5      Strawberry

I have tried this code:

for index, row in df1.iterrows():
    if df1['Subject'] == df2['Subjects']:
        df1['Row No.'] = df2['row_number']

But I get the error:

ValueError: Can only compare identically-labeled Series objects

Does that mean the dataframes have to have the same amount of rows and columns? Do they have to be labelled the same too? Is there a way to bypass this limitation?

Edit: I have found a promising alternative formula:

for x in df1['Subject']:
    if x in df2['Subjects'].values:
        df2.loc[df2['Subjects'] == x]['row_number'] = df1.loc[df1['Subject'] == x]['Row No.']

But it appears it doesn't modify the first dataframe like I want it to. Any tips why? Furthermore, I get this warning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
ACan
  • 105
  • 1
  • 1
  • 7
  • You have to use the row variable you assigned in the for loop. What you're doing is comparing the dataframe columns to each other and not the individual cell values. There's more about that here https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas – Kel Varnsen Feb 09 '21 at 23:14
  • Could you please read my edit? I figured out that was what was wrong with the first formula – ACan Feb 09 '21 at 23:16
  • see my answer below – Kel Varnsen Feb 09 '21 at 23:45
  • You dont need to use any loops. You can simply use a complete vectorized way by first creating the dictionary `d`, with mapping between `Subjects` and `row_number` and then using `df1.Subject.replace(d)` to change all the values at once. Check my answer for details. – Akshay Sehgal Feb 09 '21 at 23:49

2 Answers2

2

I would avoid using for loops especially when pandas has such great methods to handle these types of problems already.

Using pd.Series.replace

Here is a vectorized way of doing this -

  1. d is the dictionary that maps the fruit to the number in second dataframe
  2. You can use df.Subject.replace(d) to now simply replace the keys in the dict d to their values.
  3. Overwrite the Row No. column with this now.
d = dict(zip(df2['Subjects'], df2['row_number']))
df1['Row No.'] = df1.Subject.replace(d)
print(df1)
      Subject  Row No.
0       Apple        3
1      Banana        1
2      Orange        4
3       Lemon        2
4  Strawberry        5

Using pd.merge

Let's try simply merging the 2 dataframe and replace the column completely.

ddf = pd.merge(df1['Subject'], 
               df2[['row_number','Subjects']], 
               left_on='Subject', 
               right_on='Subjects', 
               how='left').drop('Subjects',1)

ddf.columns = df1.columns[::-1]
print(ddf)
      Subject  Row No.
0       Apple       3
1      Banana       1
2      Orange       4
3       Lemon       2
4  Strawberry       5
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
1

Assuming the first is df1 and the second is df2, this should do what you want it to:

import pandas as pd

d1 = {'Row No.': [1, 2, 3, 4, 5], 'Subject': ['Apple', 'Banana', 'Orange', 
     'Lemon', 'Strawberry']}
df1 = pd.DataFrame(data=d1)

d2 = {'row_number': [1, 2, 3, 4, 5, 6, 7], 'Subjects': ['Banana', 'Lemon', 'Apple', 
'Orange', 'Strawberry', 'Cranberry', 'Watermelon'], 'Special?': ['Yes', 'No', 
     'No', 'No', 
     'Yes', 'Yes', 'No']}
df2 = pd.DataFrame(data=d2)

for x in df1['Subject']:
    if x in df2['Subjects'].values:
        df1.loc[df1['Subject'] == x, 'Row No.'] = (df2.loc[df2['Subjects'] == x]['row_number']).item()

#print(df1)
#print(df2)

In your edited answer it looks like you had the dataframes swapped and you were missing the item() to get the actual row_number value and not the Series object.

Kel Varnsen
  • 314
  • 2
  • 8