I have converted my dataset into binary. All the numeric values which were 0 are 0 and all the values above 0 converted to 1. Now, this causes duplication in my data. I am using the NSLKDD data set. There were more than 25000 instances and now after converting them to binary and removing duplicates, only 1729 instances left which are not duplicate. How am I supposed to binarize them without duplication? Oh and I am feeding this to the Genetic Algorithm and it is also causing duplication of offsprings.
Asked
Active
Viewed 32 times
1 Answers
-1
I am not sure if I know the dataset, but if you have a dataframe df
with several columns:
df
columnA columnB columnC ....
....
This question gives you an overview about removing duplicates:
#drop duplicates (complete row is the same):
df.drop_duplicates(keep=First, inplace=True)
#drop duplicates only when column value is the same:
df.drop_duplicates(subset=['columnA'], keep=First, inplace=True)

PV8
- 5,799
- 7
- 43
- 87