1

I need to compare items in a list to a column in df1 and create new column in df2 with value from a different column on the same row in df1, if there is a match between a list item and the compared df1 column

My code and results

import pandas as pd
  
# initialize list elements
new_data = [[10,'a'], [20, 'b'], [30, 'c'], [40, 'd']]
old_data = [['10a','A', 10.05], [22, 'B', 12.10], [40, 'C', 1.10], 
[20, 'D', 8.05], [15, 'E', 9.00]]
lst = [20, 40, 50, 70, '20a']
  
# Create the pandas DataFrame with column name is provided explicitly
df1 = pd.DataFrame(new_data, columns=['Numbers', 'Letters'])
print(df)
df2 = pd.DataFrame(old_data, columns=['level', 'cap', 'time'])
df2['Exit'] = df1.apply(lambda x: x['Letters'] if \
str(x['Numbers']) in lst else 0, axis=1)
print(df2)

Result

level cap   time  Exit
0   10a   A  10.05   0.0
1    22   B  12.10   0.0
2    40   C   1.10   0.0
3    20   D   8.05   0.0
4    15   E   9.00   NaN

But Expected

level cap   time Exit
0   10a   A  10.05   0
1    22   B  12.10   0
2    40   C   1.10   d
3    20   D   8.05   b
4    15   E   9.00   0

I am missing something?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Starter
  • 417
  • 4
  • 12
  • I don't understand where the list comes into play. If you have df1 and df2 why not just merge [Pandas Merging 101](/q/53645882/15497888) `df2.merge(df1, how='left', left_on='level', right_on='Numbers')` – Henry Ecker Jul 13 '22 at 23:46
  • `df2.merge(df1.rename(columns={'Numbers': 'level', 'Letters': 'Exit'}), how='left', on='level').fillna(0)` with cleanup – Henry Ecker Jul 13 '22 at 23:47
  • The list can contain more values than found in any of the dfs – Starter Jul 13 '22 at 23:47
  • But how does that come into play if the Exit column can only get values from `df1`? Or you want the value in the Letters column if it matches any value in the list regardless of if it matches the corresponding value in `df1` Numbers column? – Henry Ecker Jul 13 '22 at 23:48
  • want to make sure that the values used for matching can be found in the list – Starter Jul 13 '22 at 23:49

1 Answers1

1

We can make the Numbers column the index of df1 with set_index then reindex the DataFrame with lst so that only values that appear in the list remain in the DataFrame. Then reindex again with the level column of df2 and to_numpy to assign the column without index alignment:

df2['Exit'] = (
    df1.set_index('Numbers')
        .reindex(index=lst)  # Ensure only values from list are present
        .reindex(index=df2['level'], fill_value=0)  # Re-align to df2['level'] column
        .to_numpy()  # Ignore index alignment
)

df2:

  level cap   time Exit
0   10a   A  10.05    0
1    22   B  12.10    0
2    40   C   1.10    d
3    20   D   8.05    b
4    15   E   9.00    0

Alternatively, we can then join a restructured df1 to df2.

To restructure df1 set_index and reindex with lst to create the subset only using values from lst

(
    df1.set_index('Numbers')
        .reindex(index=lst)
        .rename(columns={'Letters': 'Exit'})
)

        Exit
Numbers     
20         b
40         d
50       NaN
70       NaN
20a      NaN

The rename ensures that the new column appears with the correct name and then fillna to fix the NaN values.

All together this looks like:

df2 = df2.join(
    df1.set_index('Numbers')
        .reindex(index=lst)
        .rename(columns={'Letters': 'Exit'}),
    how='left',
    on='level'
).fillna({'Exit': 0})

df2:

  level cap   time Exit
0   10a   A  10.05    0
1    22   B  12.10    0
2    40   C   1.10    d
3    20   D   8.05    b
4    15   E   9.00    0

Yet another option, creating a mapper from the modified df1 above using Series.to_dict and then mapping the values back to df2 level column:

df2['Exit'] = df2['level'].map(
    df1.set_index('Numbers')['Letters']
        .reindex(index=lst)
        .dropna()
        .to_dict()
).fillna(0)

df2:

  level cap   time Exit
0   10a   A  10.05    0
1    22   B  12.10    0
2    40   C   1.10    d
3    20   D   8.05    b
4    15   E   9.00    0

Setup and version:

import pandas as pd  # version 1.4.3

lst = [20, 40, 50, 70, '20a']

df1 = pd.DataFrame(
    [[10, 'a'], [20, 'b'], [30, 'c'], [40, 'd']],
    columns=['Numbers', 'Letters']
)
df2 = pd.DataFrame(
    [['10a', 'A', 10.05], [22, 'B', 12.10], [40, 'C', 1.10],
     [20, 'D', 8.05], [15, 'E', 9.00]],
    columns=['level', 'cap', 'time']
)
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • I appreciate the variety of your answers and the explanations provided. This is really helpful. Seems there are lots of ways to accomplish the task – Starter Jul 14 '22 at 00:11
  • NP. It just depends what order you want to do the reshaping and the joining. These should all be functionally equivalent but some may be faster than others depending on scale and shape of actual data. – Henry Ecker Jul 14 '22 at 00:13
  • I will check to see by how much each one of them scales because of the large size of my data – Starter Jul 14 '22 at 00:16