-1

Let's say I have a table that looks like this:

table = pd.DataFrame(
          columns=["Name", "Size", "Color"], 
          data=[['A', 1, 'Red'], ['B', 2, 'Green'], ['C', 3, 'Blue']]
        )

And a lookup table that looks like this:

lookup = pd.DataFrame(
           columns=["Color", "Source", "Lead Days"], 
           data=[["Red", "Europe", 2], 
                 ["Green", "Europe", 3], 
                 ["Blue", "US", 1], 
                 ["Yellow", "Europe", 2]]
         )

How might I add columns "Source" and "Lead Days" to table by looking up "Color" from lookup?

A story sometimes helps.

table has all the items I need to order.

lookup has where I order them from and how long it takes.

I want to transform table so that it can show the "Source" and "Lead Days" for each item I need to order.

Here's what the final table should look like:

Note: while I'm sure there's a way to do this with merge, or top level table operations. In the spirit of Minimally Sufficient Pandas, and to avoid the huge kluge that is pandas' over-provisioning of operations, I'd prefer to do it with apply. Apply is nice because it's easy to consistently reach for apply in all situations.

Here's my current approach, but it results in the error ValueError: Columns must be same length as key

To me, this makes little sense, since I'm returning a list of length 2 and putting it into two columns. But I'm sure pandas has its reasons for being anti-intuitive here.

lookup_columns = ["Source", "Lead Days"]
table[lookup_columns] = table.apply(
    lambda row: 
          lookup.query('`Color` == "{color}"'.format(color=row["Color"])).loc[:, lookup_columns].values[0]
      , axis = 1)
Connor
  • 4,216
  • 2
  • 29
  • 40
  • According to the article you referenced the merge operation is an 'acceptable' operation see 'Best of the API' section. As others have pointed out apply is unnecessarily computationally expensive and complicated / hard for others to understand. – quizzical_panini Oct 13 '21 at 23:51
  • @quizzical_panini thanks for pointing that out. The article isn't my sacred text, it's just pointing in the same direction as this question; i.e. simplification. Is there a way to merge based on multiple criteria: e.g. `x > 3 and Color = "Red" (that's another criteria of my current lookup)? Also, what happens if multiple columns match? – Connor Oct 13 '21 at 23:53

3 Answers3

2

With apply, you can do:

>>> pd.concat([table, table['Color'].apply(lambda x: lookup.loc[lookup['Color'] == x, ['Source', 'Lead Days']].squeeze())], axis=1)

  Name  Size  Color  Source  Lead Days
0    A     1    Red  Europe          2
1    B     2  Green  Europe          3
2    C     3   Blue      US          1

Old answer

Use pd.merge:

>>> pd.merge(table, lookup, how='left', on='Color')

  Name  Size  Color  Source  Lead Days
0    A     1    Red  Europe          2
1    B     2  Green  Europe          3
2    C     3   Blue      US          1
Corralien
  • 109,409
  • 8
  • 28
  • 52
2

I wouldnt' use apply. It is computationally expensive. For your case, join will suffice.

table.set_index('Color').join(lookup.set_index('Color')).reset_index()



    Color Name  Size  Source  Lead Days
0    Red    A     1  Europe          2
1  Green    B     2  Europe          3
2   Blue    C     3      US          1
wwnde
  • 26,119
  • 6
  • 18
  • 32
1

Use result_type="expand":

lookup_columns = ["Source", "Lead Days"]
table[lookup_columns] = table.apply(
    lambda row:
    lookup.query('`Color` == "{color}"'.format(color=row["Color"])).loc[:, lookup_columns].values[0]
    , axis=1, result_type="expand")

print(table)

  Name  Size  Color  Source  Lead Days
0    A     1    Red  Europe          2
1    B     2  Green  Europe          3
2    C     3   Blue      US          1

From the documentation (emphasis mine):

result_type {‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):
‘expand’ : list-like results will be turned into columns.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • damn can't believe that was the issue. Ok, not to be too picky, is there any way I can modify the return value so that I don't have to use the `result_type` parameter? I know I sound like I don't want to learn the tool... well yeah I guess I don't want to learn it. I can barely stand writing `axis=1` each time haha – Connor Oct 13 '21 at 23:12
  • I know I sound ridiculous. I guess pandas violates my sensibilities. I've been using it for 3 years on and off and I'm so annoyed that whenever I return to it it's such a damn heavy relearning process to get it to do the extremely simple things I want it to do – Connor Oct 13 '21 at 23:17
  • 1
    @Connor I'm not sure if there is way to achieve what you want without setting the result_type argument. – Dani Mesejo Oct 13 '21 at 23:20