Sorting lists with indexing for easy item removal

Question

I am facing the following problem. I have multiple lists in Python that I want them to have them sorted with a sort of indexing in order to remove items from the other lists. Let me further explain.

listA_ID = [1,2,3,5,6,7] # integer from 0-250
listA_Var1 = [3.9, 4.7, 2.1, 1.2, 0.15, 0.99]

listB_ID = [2,5,6,7,8,10] # integer from 0-250
listB_Var1 = [0.54, 0.35, 1.19, 2.45, 3.1, 1.75]

>> After Comparison of listA_ID & listB_ID I should end up with the common IDs.
listA_ID = listB_ID = sorted(list(set(listA_ID) & set(listB_ID)))
listA_ID = [2,5,6,7]
listB_ID = [2,5,6,7]

Therefore I want to delete the elements [1, 3] from listA_ID which are in the positions of [0, 2] of that list and the same thing from listA_Var1, delete [3.9, 2.1] which are in the same positions [0, 2].

Similarly, I want to remove the elements [8, 10] from listB_ID which are in the positions of [4, 5] of that list and the same thing from listB_Var1, delete [3.1, 1.75] which are in the same positions [4, 5].

>> and then listA_Var1 & listB_Var1 will become
listA_Var1 = [4.7, 1.2, 0.15, 0.99]
listB_Var1 = [0.54, 0.35, 1.19, 2.45]

Any ideas on an efficient way to implement that? From my experience using Matlab a lot, after comparing the two lists, I have a way to get the indexes that are not needed and then applying these indexes to the lists, what I get are the final lists listA_Var1 & listB_Var1.

Any ideas please? Thanks in advance!

I am a bit unfamiliar with the "pandas" term (I will be honest), however, I thought numpy maybe as a Matlab style-wise solution. — V. Nikolaidis, Jun 17 '21 at 16:12

Georgy Kopshteyn · Accepted Answer · 2021-06-18T06:22:24.657

2

1. Getting the Intersection

There are many way to do this. For a detailed discussion see here. As is suggested there, if dublicates do not matter (i.e. your lists either do not contain dublicates or they do but you do not care about them), you can, for example, use set() to get the shared values:

intersection_A_B = sorted(list(set(listA_ID) & set(listB_ID)))

Alternatively, you can also turn just one of the lists into a set and then use the intersection() method, such as:

intersection_A_B = list(set(listA_ID).intersection(listB_ID))

In contrast, if dublicates matter or could pose an issue (say, both listA_ID and listB_ID feature a value twice and you want your intersection to preserve both listings of the value), instead of using set() or intersection(), you could use list comprehension:

intersection_A_B = [x for x in listA_ID if x in listB_ID]

2. Removing Values

Edit: After getting the intersection (note that, now that I got what you were really after, the first step of the process refers to intersection_A_B instead of updating listA_ID and listB_ID because their original states are needed for the following operation), this should do the trick:

del_indices_A = [i for i, value in enumerate(listA_ID) if value not in intersection_A_B]
listA_Var1 = [listA_Var1[x] for x in range(len(listA_Var1)) if x not in del_indices_A]

del_indices_B = [i for i, value in enumerate(listB_ID) if value not in intersection_A_B]
listB_Var1 = [listB_Var1[x] for x in range(len(listB_Var1)) if x not in del_indices_B]

This first checks which indices in listA_ID and listB_ID corresponded to values not included in intersection_A_B and then excludes values corresponding to those indices in listA_Var1 and listB_Var2.

edited Jun 18 '21 at 06:22

answered Jun 17 '21 at 16:00

Georgy Kopshteyn

678
3
13

This is an elegant solution using set +1. – Karn Kumar Jun 17 '21 at 16:09
Thank you very much @Georgy Kopshteyn. I did not make myself perfectly clear though. A1, A2... or B1, B2,... will not be strings, but numbers, floats to be exact. I just want to be able to extract the corresponding elements from listA_Var1 & listB_Var1, after the operation "listA_ID = listB_ID = list(set(listA_ID) & set(listB_ID))" – V. Nikolaidis Jun 17 '21 at 16:23
I would also recommed using sorted elements like `sorted(set(listA_ID).intersection(listB_ID))` – Karn Kumar Jun 17 '21 at 16:24
and I might have duplicate numbers, but the operation "listA_ID = listB_ID = list(set(listA_ID) & set(listB_ID))" seems to keep only the first duplicate it finds, and this suits my needs! – V. Nikolaidis Jun 17 '21 at 16:25
@V.Nikolaidis If you need to convert them to floats, simply use `float()` instead of the string formatting. E.g.: `listA_float = [float(x) for x in listA_ID]` would give you `[2.0, 5.0, 6.0, 7.0, 8.0, 9.0, 14.0]`. Is this what you had in mind? – Georgy Kopshteyn Jun 17 '21 at 16:28
@V.Nikolaidis, if it fits your requirement, you should accept it as an answer. – Karn Kumar Jun 17 '21 at 16:31
Basically, as my final product, I want the numbers from listA_Var1 that correspond to the listA_ID, e.g. listA_ID[1,3,4,5] which is [2,5,6,7], therefore I want the values from listA_Var1[1,3,4,5] that correspond to the values [4.7, 1.2, 0.15, 0.99]. (does it help a bit more in the clarification or am I thinking too much in Matlab style?) – V. Nikolaidis Jun 17 '21 at 16:51
@V.Nikolaidis So you want is to remove items from `listA_Var1` and `listB_Var1` whose indices are not included in the intersection of `listA_ID` and `listB_ID`? If this is the case, then I do not understand your example. 1. indices in python usually start counting from 0. So directly using `[2, 5, 6, 7]` would result in `listA_Var1 = [2.1, 0.99]` (with indices 6 and 7 falling out of range). 2. Even when accounting for this by doing -1 for all indices, the result would be `listA_Var1 = [4.7, 0.15, 0.99]` (with index 7 being out of range). So how do you arrive at the result in your post? – Georgy Kopshteyn Jun 17 '21 at 18:42
Dear @GeorgyKopshteyn, thank you for supporting and keeping up with me. [2, 5, 6, 7] are not indices. These are the common ID values in both lists (listA_ID & listB_ID) that I want to keep. And therefore remove the values [1,3] from listA_ID and [8,10] from listB_ID at the positions [0,2] & [4,5] respectively. Then, I want to remove the values of the other 2 lists, listA_Var1 & listB_Var1, that correspond to the same indices [0,2] & [4,5] respectively. – V. Nikolaidis Jun 17 '21 at 22:01
@V.Nikolaidis Thank you for the update, I now have understood what you wanted to do. I have updated my answer accordingly, let me know whether this works for you. Note that, instead of updating `listA_ID` and `listB_ID` when performing in the first part, I now assign `sorted(list(set(listA_ID) & set(listB_ID)))` to `intersection_A_B` to preserve the original states of the lists for the second part of the operaton. – Georgy Kopshteyn Jun 17 '21 at 22:28
@GeorgyKopshteyn Thank you for your time and your fully working answer. I also managed to solve this problem with numpy. I will post my version shortly and if you have a few minutes, I would appreciate if you happen to know workaround is faster. – V. Nikolaidis Jun 18 '21 at 10:30

Carmoreno · Answer 2 · 2021-06-18T00:31:03.457

First, I'm going to explain step by step this approach:

- Step 1: We are going to looking for the intersection elements in both, listA_ID and listB_ID.

intersection_AB = set(listA_ID) & set(listB_ID)

- Step 2: Then, we do a difference of sets. It's very important putting in the first place set(listA_ID), because the difference of sets is not commutative.

# You can use difference() method alternatively:
# A_elements = list(set(listA_ID).difference(intersection_AB)) but personally I like the minus operator.

A_elements = list(set(listA_ID) - intersection_AB)

- Step 3: Then, We looking for the indexes based on the elements found in the previous step.

index_to_remove_list_A = [listA_ID.index(i) for i in A_elements]

Or you can use also (althoug less legible):

index_to_remove_list_A = [listA_ID.index(i) for i in list(set(listA_ID) - intersection_AB)]

- Step 4: Delete the correct elements in the list.

for i in sorted(index_to_remove_list_A, reverse=True):
  del listA_Var1[i]

print(listA_Var1)

Edit: Full code with both lists ...

A_elements = list(set(listA_ID) - intersection_AB)
B_elements = list(set(listB_ID) - intersection_AB)

index_to_remove_list_A = [listA_ID.index(i) for i in A_elements]
index_to_remove_list_B = [listB_ID.index(i) for i in B_elements]

for i in sorted(index_to_remove_list_A, reverse=True):
  del listA_Var1[i]

for i in sorted(index_to_remove_list_B, reverse=True):
  del listB_Var1[i]


print(listA_Var1) # [4.7, 1.2, 0.15, 0.99]
print(listB_Var1) # [0.54, 0.35, 1.19, 2.45]

Your solution @CarMoreno is working fine as well with the difference() method and not the (-) sign. I can not understand why, because I have seen it elsewhere. Many Thanks for your time !!! — V. Nikolaidis, Jun 18 '21 at 11:07

score 1 · Answer 3 · edited Jun 18 '21 at 11:35

Well, I will post a working solution myself too, using numpy.

intersection_A_B = sorted(list(set(listA_ID) & set(listB_ID)))

# Convert Lists to Arrays
np_listA_ID = np.asarray( listA_ID )
np_listB_ID = np.asarray( listB_ID )

# Comparison of two arrays
np_list_ID, listA_ind, listB_ind = np.intersect1d(np_listA_ID, np_listB_ID, assume_unique=False, return_indices=True)

# Keep only Items Needed    
np_listA_Var1 = np.asarray( listA_Var1 )
np_listB_Var1 = np.asarray( listB_Var1 )

# Covert Array to List again
listA_ID=listB_ID=np_list_ID.tolist()
listA_Var1 = np_listA_Var1[listA_ind].tolist()
listB_Var1 = np_listB_Var1[listB_ind].tolist()

Sorting lists with indexing for easy item removal

3 Answers3