0

During communities' detection I am trying to remove duplicates nodes from lists of lists (aimed to calculate ARI). What I have – few dozen lists inside one list with different dimensions:

lst_of_lts= [[5192, 32896, 34357, 34976, 36683, 43315], … ,[19, 92585, 94137, 98381, 99041, 100395, 101100, 109759]]

What I am running:

import itertools

Lst_of_lts.sort()

Lst_of_lts_2 = list(k for k,_ in itertools.groupby(Lst_of_lts))

Lst_of_lts_nodops= [list(i) for i in {tuple(sorted(i)) for i in Lst_of_lts_2}]

For some reason, it doesn’t remove duplicates.

The dimensions remain the same- Any suggestions?

Also tried many options such as:

Remove duplicate items from lists in Python lists and Remove duplicated lists in list of lists in Python

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
  • What do you mean by "duplicates nodes"? And can you give an example of input and the corresponding output? – xonturis Jul 23 '22 at 12:20
  • Hey, i edited it ( nodes are values of a network on big data mining. Actually, the input above remain the same for all 3 options in mentioned, meaning no duplicates remove – user532567 Jul 23 '22 at 12:47
  • Nice option - 1 dimension array. Better- a list of lists as input and list of lists as an output after all dulicates removed – user532567 Jul 23 '22 at 14:39
  • The list of lists will eventually be equal length or unequal after removing duplicates? –  Jul 23 '22 at 15:14
  • unequal - due to the duplicates removal – user532567 Jul 23 '22 at 16:14

1 Answers1

0

If you are removing duplicates just in the list itself, you can use set.

a = np.random.randint(0,5,(10,10)).tolist()

a
Out[128]: 
[[0, 3, 0, 2, 4, 4, 0, 0, 3, 3],
 [2, 4, 0, 2, 4, 2, 2, 4, 3, 1],
 [3, 2, 0, 1, 2, 0, 2, 0, 2, 1],
 [3, 1, 4, 1, 0, 1, 4, 4, 3, 4],
 [2, 0, 1, 1, 0, 4, 1, 4, 2, 3],
 [0, 0, 1, 3, 4, 3, 1, 3, 0, 1],
 [1, 2, 0, 2, 1, 3, 4, 2, 2, 0],
 [3, 3, 2, 2, 0, 4, 1, 1, 0, 0],
 [0, 1, 3, 0, 4, 4, 2, 1, 1, 4],
 [0, 1, 4, 4, 0, 1, 3, 2, 1, 1]]

[list(set(i)) for i in a]
Out[129]: 
[[0, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3],
 [0, 1, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4]]

Or if you want to preserve the order of the element, you can use dict.fromkeys

[list(dict.fromkeys(i)) for i in a]
Out[133]: 
[[0, 3, 2, 4],
 [2, 4, 0, 3, 1],
 [3, 2, 0, 1],
 [3, 1, 4, 0],
 [2, 0, 1, 4, 3],
 [0, 1, 3, 4],
 [1, 2, 0, 3, 4],
 [3, 2, 0, 4, 1],
 [0, 1, 3, 4, 2],
 [0, 1, 4, 3, 2]]
  • Hey, i have tried few times and both [list(set(i)) for i in a] and [list(dict.fromkeys(i)) for i in a] arent working.. dimensions remain the same :( – user532567 Jul 24 '22 at 14:42
  • That's interesting, may I ask if the values in the lists are integers or strings? –  Jul 24 '22 at 15:15
  • From my code, both print(all([isinstance(item, int) for item in lst_of_lists])) and print(all([isinstance(item, int) for item in lst_of_lists2])) outputps FALSE...so not integers – user532567 Jul 24 '22 at 15:38
  • Try converting all of them to integers first before you proceed with the methods above. `[[int(i) for i in row] for row in a]` –  Jul 24 '22 at 15:43
  • Hey and thanks for your help. Unfortunately, it doesn't change anything. Might the fact that list_oflists1 source from txt file as a community network and list_oflists2 also which received form the command list(nxcom.asyn_lpa_communities(G)) during 2 networks detection? – user532567 Jul 25 '22 at 17:51