Look up data from other table with apply

Question

Let's say I have a table that looks like this:

table = pd.DataFrame(
          columns=["Name", "Size", "Color"], 
          data=[['A', 1, 'Red'], ['B', 2, 'Green'], ['C', 3, 'Blue']]
        )

And a lookup table that looks like this:

lookup = pd.DataFrame(
           columns=["Color", "Source", "Lead Days"], 
           data=[["Red", "Europe", 2], 
                 ["Green", "Europe", 3], 
                 ["Blue", "US", 1], 
                 ["Yellow", "Europe", 2]]
         )

How might I add columns "Source" and "Lead Days" to table by looking up "Color" from lookup?

A story sometimes helps.

table has all the items I need to order.

lookup has where I order them from and how long it takes.

I want to transform table so that it can show the "Source" and "Lead Days" for each item I need to order.

Here's what the final table should look like:

Note: while I'm sure there's a way to do this with merge, or top level table operations. In the spirit of Minimally Sufficient Pandas, and to avoid the huge kluge that is pandas' over-provisioning of operations, I'd prefer to do it with apply. Apply is nice because it's easy to consistently reach for apply in all situations.

Here's my current approach, but it results in the error ValueError: Columns must be same length as key

To me, this makes little sense, since I'm returning a list of length 2 and putting it into two columns. But I'm sure pandas has its reasons for being anti-intuitive here.

lookup_columns = ["Source", "Lead Days"]
table[lookup_columns] = table.apply(
    lambda row: 
          lookup.query('`Color` == "{color}"'.format(color=row["Color"])).loc[:, lookup_columns].values[0]
      , axis = 1)

According to the article you referenced the merge operation is an 'acceptable' operation see 'Best of the API' section. As others have pointed out apply is unnecessarily computationally expensive and complicated / hard for others to understand. — quizzical_panini, Oct 13 '21 at 23:51
@quizzical_panini thanks for pointing that out. The article isn't my sacred text, it's just pointing in the same direction as this question; i.e. simplification. Is there a way to merge based on multiple criteria: e.g. `x > 3 and Color = "Red" (that's another criteria of my current lookup)? Also, what happens if multiple columns match? — Connor, Oct 13 '21 at 23:53

Corralien · Answer 1 · 2021-10-13T23:08:55.683

2

With apply, you can do:

>>> pd.concat([table, table['Color'].apply(lambda x: lookup.loc[lookup['Color'] == x, ['Source', 'Lead Days']].squeeze())], axis=1)

  Name  Size  Color  Source  Lead Days
0    A     1    Red  Europe          2
1    B     2  Green  Europe          3
2    C     3   Blue      US          1

Old answer

Use pd.merge:

>>> pd.merge(table, lookup, how='left', on='Color')

  Name  Size  Color  Source  Lead Days
0    A     1    Red  Europe          2
1    B     2  Green  Europe          3
2    C     3   Blue      US          1

edited Oct 13 '21 at 23:08

answered Oct 13 '21 at 23:00

Corralien

109,409
8
28
52

see note on using merge? – Connor Oct 13 '21 at 23:01
You're right. Sorry :) – Corralien Oct 13 '21 at 23:01
Do you know `apply` is a slow function (like a loop) – Corralien Oct 13 '21 at 23:02
It's not as slow as a for loop. Plus, I'm interested in doing speedy (not meaning performant) data analysis. poitroae – Connor Oct 13 '21 at 23:04
I.e. fast feedback without looking at api docs is most important – Connor Oct 13 '21 at 23:07
There's another reason not to use `merge` (I'm feeling defensive :) in my actual usecase I need to select based on multiple criteria, for which I think `query` is likely best suited? Is there a way to merge based on criteria like `x > 3 and `Color` = "Red"`? – Connor Oct 13 '21 at 23:20
@wwnde did you see this :point-up:? – Connor Oct 13 '21 at 23:30
What do you mean by `x > 3`. `x` is a color not the whole row. It's possible to pass row to `apply`. Give a concrete example, I will update my answer. – Corralien Oct 14 '21 at 04:57
obviously, yes the apply route will work for selecting based on multiple criteria. I’m wondering if merge has any solutions for selecting rows based on multiple criteria – Connor Oct 14 '21 at 12:42
You have to create on-the-fly columns based on criteria before merging. – Corralien Oct 14 '21 at 12:47
huh could you give an example of what that might look like? – Connor Oct 14 '21 at 12:50
1

https://stackoverflow.com/questions/45869886/python-pandas-how-to-merge-based-on-an-or-condition/45870198 – Corralien Oct 14 '21 at 12:53

score 2 · Answer 2 · answered Oct 13 '21 at 23:25

I wouldnt' use apply. It is computationally expensive. For your case, join will suffice.

table.set_index('Color').join(lookup.set_index('Color')).reset_index()



    Color Name  Size  Source  Lead Days
0    Red    A     1  Europe          2
1  Green    B     2  Europe          3
2   Blue    C     3      US          1

score 1 · Accepted Answer · answered Oct 13 '21 at 23:08

1

Use result_type="expand":

lookup_columns = ["Source", "Lead Days"]
table[lookup_columns] = table.apply(
    lambda row:
    lookup.query('`Color` == "{color}"'.format(color=row["Color"])).loc[:, lookup_columns].values[0]
    , axis=1, result_type="expand")

print(table)

  Name  Size  Color  Source  Lead Days
0    A     1    Red  Europe          2
1    B     2  Green  Europe          3
2    C     3   Blue      US          1

From the documentation (emphasis mine):

result_type {‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):
‘expand’ : list-like results will be turned into columns.

answered Oct 13 '21 at 23:08

Dani Mesejo

61,499
6
49
76

damn can't believe that was the issue. Ok, not to be too picky, is there any way I can modify the return value so that I don't have to use the `result_type` parameter? I know I sound like I don't want to learn the tool... well yeah I guess I don't want to learn it. I can barely stand writing `axis=1` each time haha – Connor Oct 13 '21 at 23:12
I know I sound ridiculous. I guess pandas violates my sensibilities. I've been using it for 3 years on and off and I'm so annoyed that whenever I return to it it's such a damn heavy relearning process to get it to do the extremely simple things I want it to do – Connor Oct 13 '21 at 23:17
1

@Connor I'm not sure if there is way to achieve what you want without setting the result_type argument. – Dani Mesejo Oct 13 '21 at 23:20

Look up data from other table with apply

3 Answers3