0

Set-up

I have two lists containing unique ID's: tl and yl.

An ID looks like '97678410'.

len(tl) = 185 and len(yl) = 182.


Problem

I need the difference between the two lists. That is, I want to obtain a list containing all ID's which are in tl but not in yl.

I've considered the following answers,

but my code,

final = [id for id in tl if id not in yl]

yields a final which is a copy of tl.

I have checked, and there are ID's both in tl as in yl.

See the two lists here.

tl = ['97682631', '97682633', '97684139', '97680023', '97680021', '97678410', '97675792', '97673883', '97673946', '97671417', '97664888', '97666272', '97667377', '97670240', '97664192', '97655130', '97655288', '97655289', '97655293', '97645033', '97645101', '97632355', '97633891', '97616084', '97608150', '97606844', '97602650', '97584169', '97595622', '97583025', '97577940', '97569579', '97570582', '97565137', '97540897', '97539337', '97518058', '97518763', '97512249', '97512252', '97493818', '97489271', '97489272', '97461011', '97459708', '97456572', '97456626', '97456632', '97453137', '97454483', '97454693', '97454892', '97450659', '97446162', '97446168', '97434415', '97436210', '97427637', '97427635', '97427632', '97425356', '97417309', '97404715', '97404716', '97404718', '97392000', '97392111', '97386620', '97383091', '97384274', '97384337', '97383090', '97383089', '97366670', '97361744', '97361698', '97361701', '97361702', '97361722', '97276849', '97272708', '97236813', '97223049', '97213666', '97174233', '97164586', '97141165', '97141170', '97145252', '97136643', '97130696', '97042669', '97036543', '97007304', '96915042', '96887981', '96883537', '96865568', '96818145', '96818147', '96776783', '96767917', '96747326', '96747763', '96735976', '96739100', '96712470', '96714575', '96695120', '96667787', '96561238', '96583133', '96558744', '96553823', '96528225', '96389164', '96377527', '96251693', '96269789', '96246548', '96171491', '96161689', '96153980', '96131831', '96131903', '96131956', '96131749', '96126777', '96101829', '95958417', '95914030', '95735741', '95656954', '95503961', '95501454', '95502625', '95474681', '95468325', '95472748', '95474209', '95440076', '95443633', '95408190', '95276367', '95222901', '95218637', '95218977', '95187313', '94948878', '94806104', '94306925', '94097967', '94018001', '94090391', '93860264', '93423007', '93206111', '92251067', '92480603', '91550754', '89421778', '89658137', '89123891', '88860045', '88903715', '88169920', '85395060', '83483635', '82819637', '81788095', '80286689', '76733816', '74983036', '75270743', '72079817', '69163539', '66080651', '62508733', '58272006', '58927395', '59487908', '59764478', '57909458', '51546448', '41278948']
yl = [97655130, 97645101, 97642334, 97642352, 97633891, 97636938, 97632355, 97616084, 97606844, 97608150, 97605869, 97602650, 97584169, 97595622, 97583025, 97577940, 97574727, 97569579, 97570582, 97566414, 97565137, 97540897, 97539337, 97518058, 97518763, 97512252, 97512249, 97493818, 97489271, 97489272, 97461011, 97459708, 97456572, 97456626, 97456632, 97453137, 97454483, 97454693, 97454892, 97446162, 97446168, 97450659, 97436210, 97434415, 97427632, 97427635, 97427637, 97425356, 97417309, 97404718, 97404715, 97404716, 97392000, 97392111, 97392693, 97386620, 97383091, 97384274, 97384337, 97383090, 97383089, 97366670, 97361698, 97361701, 97361702, 97361722, 97361744, 97362574, 97276849, 97272708, 97236813, 97223049, 97213666, 97205682, 97174233, 97164586, 97141165, 97141170, 97145252, 97136643, 97130696, 97129530, 97042669, 97036543, 97007304, 96915042, 96890889, 96890891, 96883537, 96865568, 96818147, 96818145, 96801213, 96776783, 96767917, 96747326, 96747763, 96735976, 96739100, 96712470, 96714575, 96704316, 96695120, 96583133, 96558744, 96561238, 96553823, 96528225, 96466854, 96389164, 96377527, 96306028, 96251693, 96269789, 96246548, 96171491, 96153980, 96161689, 96131956, 96126777, 96131749, 96131831, 96131903, 96101829, 95958417, 95914030, 95735741, 95656954, 95503961, 95501454, 95502625, 95474681, 95468325, 95472748, 95474209, 95440076, 95443633, 95408190, 95276367, 95222901, 95218637, 95218977, 95187313, 95192968, 94948878, 94806104, 94306925, 94170831, 94097967, 94018001, 94090391, 93423007, 93206111, 92480603, 92251067, 91550754, 89865094, 89421778, 89658137, 89123891, 88860045, 88903715, 88169920, 85395060, 83483635, 82819637, 81788095, 80260371, 80286689, 76733816, 74983036, 75270743, 72079817, 69163539, 66080651, 62508733, 59487908, 59764478, 58272006, 58927395, 51546448, 41278948]

I'm pretty sure it's a stupid thing, as the code works for a simple example.

How do I obtain a final containing all ID's in tl which are not in yl?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
LucSpan
  • 1,831
  • 6
  • 31
  • 66
  • 6
    the lists have diferent types.... convert them both to `int` or `str`, then your code should work..., try this `final = [id for id in map(str, tl) if id not in map(str, yl)]` – Netwave Jul 28 '17 at 09:52
  • pffff. Thanks a bunch. – LucSpan Jul 28 '17 at 09:54

3 Answers3

2

How about using set data structure:

>>> t1 = [2, 3, 4, 6]
>>> y1 = [2, 5, 7, 3]
>>> t1 = set(t1)
>>> y1 = set(y1)
>>> t1.difference(y1)
{4, 6}
  • 1
    Yes but op has a string/int mix in his dataset – Jean-François Fabre Jul 28 '17 at 10:05
  • Python's set data structure can contain multiple data types, so it is not an issue. If, however, the strings represents numeric IDs and you want "123" to be identified as 123, then you need to convert them to int first using: t1 = list(map(int, t1)) –  Jul 28 '17 at 10:08
1
tl = ['97682631', '97682633', '97684139', '97680023',........., '51546448', '41278948']
yl = [97655130, 97645101, 97642334, 97642352,.........., 51546448, 41278948]

tl_unique = set(tl) # to make sure your current list contain set of all unique values
yl_unique = set(yl)

above two lines are not necessary if you are sure about not having duplicate values otherwise duplicate values will also be compared

diff = [int(i) for i in tl_unique if int(i) not in yl_unique] # your element in yl is in integer and in tl it is string so I used int(i) to compare integer value of element in tl_unique

print(diff)
Gahan
  • 4,075
  • 4
  • 24
  • 44
  • I checked your answer on a phone and it was just horrible. Maybe reduce your input data set a little, and turn your code comments into real text, that would make your answer look nicer. – Jean-François Fabre Jul 28 '17 at 11:10
  • oh.. never thought about mobile user.. on site there is no problem appears. also I have included that sets from original source which helps in case external link not available... however check the edit you suggested.. – Gahan Jul 28 '17 at 11:11
  • it's always better to have a concise answer. Now looks a lot better. – Jean-François Fabre Jul 28 '17 at 11:22
1

Using two slightly simplified lists for clarity (one with strings and one with integers):

tl = ['1', '2', '3']
yl = [1, 2, 4, 5]

Convert all the strings in tl to integers to be able to match them to the integers in yl

tl = [int(num) for num in tl]

Convert both lists into sets using the set() factory function.

NOTE: this presumes, as you have mentioned, that the values in each list are unique. set() will deduplicate values, without warning.

tl = set(tl)
yl = set(yl)

Use the builtin set method .difference() to identify the values in the first set that are not in the second set.

final = tl.difference(yl)
E. Ducateme
  • 4,028
  • 2
  • 20
  • 30