0
type(train_x)
numpy.ndarray

train_samples = train_x.tolist()

When I print the index of my samples, you can see that there are duplicates that are out of order. Why might this be happening?

It is messing up my pipeline downstream... but sometimes it runs fine when the index decides to preserve itself.

for tr in train_samples:
    print(train_samples.index(tr))

...
11
12
13
14 # here
15
...
39
40
41
42
14 # here
...

Proving answer about duplicate entries:

enter image description here

Kermit
  • 4,922
  • 4
  • 42
  • 74
  • @AndrasDeak now that i understand why it's happening it is definitely a duplicate... but not even close to that question – Kermit Apr 10 '20 at 21:42
  • 1
    If the linked question applies to your problem, you should be using `enumerate`. `list.index` has to search your list from the start each time. – Andras Deak -- Слава Україні Apr 10 '20 at 21:45
  • @AndrasDeak thanks man. – Kermit Apr 10 '20 at 21:47
  • Can you clarify your question? _When I print the index of my samples, you can see that there are duplicates that are out of order._ ..... _sometimes it runs fine when the index decides to preserve itself._ What `index`? `index()` is the list method, right? – AMC Apr 10 '20 at 21:48
  • @AMC SO auto intelligence wouldn't let me submit the question title as to "why it was duplicated" – Kermit Apr 10 '20 at 21:49
  • 1
    If you later want to shuffle your items (judging from the variable name) you can generate an index array `np.arange(train_x.size)`, and use a random shuffling index array to shuffle the data and these indices simultaneously. – Andras Deak -- Слава Україні Apr 10 '20 at 21:50

1 Answers1

1

The index method searches from the front of the list, so if your data contains duplicate values, index will always only find the first one.

>>> values = ['a', 'b', 'c', 'a']
>>> for v in values:
...  print("value", v, "occurs at index", values.index(v))
... 
value a occurs at index 0
value b occurs at index 1
value c occurs at index 2
value a occurs at index 0

From the docs for list.index (emphasis added):

Return the index in the list of the first item whose value is x. It is an error if there is no such item.

ApproachingDarknessFish
  • 14,133
  • 7
  • 40
  • 79