1

Let's have this DataFrame

d = {'col1': [[0,1], [0,2], [1,2], [2,3]], 'col2': ["a", "b", "c", "d"]}
df = pandas.DataFrame(data=d)

     col1 col2
0  [0, 1]    a
1  [0, 2]    b
2  [1, 2]    c
3  [2, 3]    d

Now I need to find a particular list in col1 and return the value from col2 of that line

For example I want to lookup [0,2] and get "b" in return

I have read this thread about how to do it: extract column value based on another column pandas dataframe

But when I try to apply the answers there, I don't get the result I need

df.loc[df['col1'] == [0,2], 'col2']
ValueError: Arrays were different lengths: 4 vs 2

df.query('col1==[0,2]')
SystemError: <built-in method view of numpy.ndarray object at 0x000000000D67FA80> returned a result with an error set
fleetingbytes
  • 2,512
  • 5
  • 16
  • 27

2 Answers2

1

One possible solution is compare tuples or sets:

mask = df['col1'].apply(tuple) == tuple([0,2])

mask = df['col1'].apply(set) == set([0,2])

Or compare by arrays if same length of each value of Series and also same length of comparing list or array:

mask = (np.array(df['col1'].values.tolist())== [0,2]).all(axis=1)

s = df.loc[mask, 'col2']
print (s)
1    b
Name: col2, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Not sure if you can do logical indexing in pandas DataFrames with non-numeric or string values. Heres a simple one-line workaround that compares strings instead of lists.

df.loc[df['col1'].apply(str) == str([0,1])]['col2'][0]

Essentially what you're doing is all the lists in column 1 to strings, and then comparing them to the string: str([0,1]).

Note the [0] at the end of my second line of the solution. This is because more than one of the rows might contain the list [0,1]; I select the first value that shows up.

Luke Polson
  • 434
  • 6
  • 14