-1

From a range of numbers [0:2407] I need to know what are the ones that are already being used.

arrray [0,1,2,..,2407]

To know the ones already used I have a file that I load with pandas.

example:

...| Index |...
...|   100 |...
...|  1572 |...
...|  2046 |...
...|  2045 |...

I need to remove from my initial list the ones coming from the file.

trying to do this in a clean and faster way since the files can be quite big.

  • Some simple operation *steps* - how it supposed to work? will help you get precise answers quickly. – Daniel Hao Jul 07 '22 at 16:59
  • @VRComp not attempt yet.. just can't figure out how can I do it. – Alexandre Torres Jul 07 '22 at 17:08
  • You could use 2 `sets`, and find the difference. this won't work if you want to preserve order, and won't preserve duplicates (also please update the question to be more clear concerning duplicates), but should be the most concise, while also running relatively fast: https://stackoverflow.com/questions/48044353/what-is-the-run-time-of-the-set-difference-function-in-python – Ryan Fu Jul 07 '22 at 17:21

2 Answers2

0

Create a list of flags of size 2408, initially setting all flags to false:

is_used = [False for i in range(2408)]

Iterate through your column and change the corresponding flag to True:

for entry in column:
   is_used[entry] = True

Iterate through your list and append to a new list the elements that are not used:

new_list = []

for entry in l:
    if not is_used[entry]:
        new_l.append(entry)

Summarizing all in a single method:

def remove_used(l, column):
    is_used = [False for i in range(2408)]

    for entry in column:
       is_used[entry] = True
    
    new_list = []
    
    for entry in l:
        if not is_used[entry]:
            new_l.append(entry)

    return new_list

Also, it is worth mentioning that you can speed up by dividing the loops into blocks and putting threads/processes to act on each block.

joaopfg
  • 1,227
  • 2
  • 9
  • 18
0

Try this:

import pandas as pd
import random

## for demo purpose max number is changed from 2407 to 27
max = 27

## list containing range of numbers 
unsed= list(range(max+1))
print(f'all_n  : {unsed}')

## define dataFrame exaple
df = pd.DataFrame(random.sample(range(max+1), 10), columns=['index'])

#    index
# 0      6
# 1     14
# 2     20
# 3      4
# 4     25

## convert used number to list
used = df['index'].tolist()
print(f'used   : {sorted(used)}')

## unused
for n in used:
    unused.remove(n)
print(f'unused : {unused}')

Result:

all_n  : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
used   : [4, 6, 14, 20, 25]
unused : [0, 1, 2, 3, 5, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27]
ytung-dev
  • 872
  • 1
  • 2
  • 12