Finding smallest possible difference between lists of unequal length

Question

I have a dataframe with two columns A and B that contains lists:

import pandas as pd

df = pd.DataFrame({"A" : [[1,5,10],  [], [2], [1,2]],
                   "B" : [[15, 2],   [], [6], []]})

I want to construct a third column C that is defined such that it is equal to the smallest possible difference between list-elements in A and B if they are non-empty, and 0 if one or both of them are empty.

For the first row the smallest difference is 1 (we take absolute value..), for the second row it is 0 due to lists being empty, third row is 4 and fourth row is 0 again due to one empty list, so we ultimately end up with:

df["C"] = [1, 0, 4, 0]

jpp · Accepted Answer · 2019-01-09T16:36:17.263

3

This isn't easily vectorisable, since you have object dtype series of lists. You can use a list comprehension with itertools.product:

from itertools import product

zipper = zip(df['A'], df['B'])
df['C'] = [min((abs(x - y) for x, y in product(*vals)), default=0) for vals in zipper]

# alternative:
# df['C'] = [min((abs(x - y) for x, y in product(*vals)), default=0) \
#            for vals in df[['A', 'B']].values]

print(df)
#             A        B  C
# 0  [1, 5, 10]  [15, 2]  1
# 1          []       []  0
# 2         [2]      [6]  4
# 3      [1, 2]       []  0

edited Jan 09 '19 at 16:36

answered Jan 09 '19 at 16:31

jpp

159,742
34
281
339

Is it possible to convert the dtype-series to a difference format so that it can be vectorized? – N08 Jan 09 '19 at 16:33
1

@N08, Since you have variable length lists, not easily, no. NumPy / Pandas works best when each dimension has the same number of values. You may end up with lots of empty values, which is also inefficient. – jpp Jan 09 '19 at 16:34
@jpp, nice one, what is zipper here. – Karn Kumar Jan 09 '19 at 16:44
1

Just realized that was precisely what you had in your solution :-), left it as it was with the ternary if. Thanks for the tips anyways – yatu Jan 09 '19 at 16:45
@yatu, No probs, always like to help :) – jpp Jan 09 '19 at 16:47

yatu · Answer 2 · 2019-01-09T16:43:54.160

2

You can use the following list comprehension, checking for the min difference of the cartesian product (itertools.product) from both columns

[min(abs(i-j) for i,j in product(*a)) if all(a) else 0 for a in df.values]
[1, 0, 4, 0]

edited Jan 09 '19 at 16:43

answered Jan 09 '19 at 16:34

yatu

86,083
12
84
139

1

`min` can take an iterable (so you can use a generator expression, no extra list creation required). In addition, `tolist()` isn't required when iterating `df.values`. Finally, `min` has a default value which removes the need for a ternary statement. – jpp Jan 09 '19 at 16:37

score 2 · Answer 3 · answered Jan 09 '19 at 16:44

2

df['C'] = df.apply(lambda row: min([abs(x - y) for x in row['A'] for y in row['B']], default=0), axis=1)

answered Jan 09 '19 at 16:44

Filipe Aleixo

3,924
3
41
74

1

Awesome @Filipe +1. – Karn Kumar Jan 09 '19 at 16:45
1

@pygo notice this will slow than jpp and yatu 's solution . :-) – BENY Jan 09 '19 at 16:50

score 1 · Answer 4 · answered Jan 09 '19 at 16:49

I just want to introduce the unnesting again

df['Diff']=unnesting(df[['B']],['B']).join(unnesting(df[['A']],['A'])).eval('C=B-A').C.abs().min(level=0)
df.Diff=df.Diff.fillna(0).astype(int)
df
Out[60]: 
            A        B  Diff
0  [1, 5, 10]  [15, 2]     1
1          []       []     0
2         [2]      [6]     4
3      [1, 2]       []     0

FYI

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

Ananay Mital · Answer 5 · 2019-01-09T17:45:31.450

0

I think this works

def diff(a,b):    
    if len(a) > 0 and len(b) > 0:
        return min([abs(i-j) for i in a for j in b])
    return 0

df['C'] = df.apply(lambda x: diff(x.A, x.B), axis=1)
df

           A        B   C
0   [1, 5, 10]  [15, 2] 1
1           []       [] 0
2          [2]      [6] 4
3       [1, 2]       [] 0

edited Jan 09 '19 at 17:45

answered Jan 09 '19 at 16:30

Ananay Mital

1,395
1
11
16

Finding smallest possible difference between lists of unequal length

5 Answers5