Hello everyone I am kinda new to this, and I don't know what to do anymore, the problem that I have is the following one : I have a dataset with 2 columns and 325729 rows. I create 10.000 random numbers all different. I create a bitvector of size 325729 with 1 for each of the 10000 and zero for the rest. Now I need to do a for loop of the dataset and take each row and check the value. If the value of both values of each row is contained in the 10.000 random number then I don't drop it.
The problem is that it take forever and the last time it run for 3h and it did not finish. I don't know what to do anymore at this point.
I will add the code that I am running.
# Import the data :
import pandas as pd
df22=pd.read_table('web-NotreDame.txt',header=None)
# Create the bitvector and the random variables
data123 = np.random.randint(0,10000,size=10000)
data123.sort()
print(len(data123))
uniques = np.unique(data123)
print(len(uniques))
data1234 = [0] * 325729
for val in uniques:
data1234[val] = 1
# Dropping the rows
for ind in df22.index:
if(data1234[df22[0][ind]] !=1 and data1234[df22[1][ind]] !=1):
df22.drop(ind)
If someone can give me a hand by telling me how can I make the program at least finish, by the way, yes I checked other solutions on Stackoverflow but it did not work out so this is my last solution. Thank you in advance for your help.