How to label encode 2 pandas dataframes?

Question

I have 2 pandas dataframes. I need them to have the same label encoding because I want to use them for machine learning.

dftrain.label.unique()

array(['normal.', 'buffer_overflow.', 'loadmodule.', 'perl.', 'neptune.',
       'smurf.', 'guess_passwd.', 'pod.', 'teardrop.', 'portsweep.',
       'ipsweep.', 'land.', 'ftp_write.', 'back.', 'imap.', 'satan.',
       'phf.', 'nmap.', 'multihop.', 'warezmaster.', 'warezclient.',
       'spy.', 'rootkit.'], dtype=object)

dftest.label.unique()

array(['normal.', 'snmpgetattack.', 'named.', 'xlock.', 'smurf.',
       'ipsweep.', 'multihop.', 'xsnoop.', 'sendmail.', 'guess_passwd.',
       'saint.', 'buffer_overflow.', 'portsweep.', 'pod.', 'apache2.',
       'phf.', 'udpstorm.', 'warezmaster.', 'perl.', 'satan.', 'xterm.',
       'mscan.', 'processtable.', 'ps.', 'nmap.', 'rootkit.', 'neptune.',
       'loadmodule.', 'imap.', 'back.', 'httptunnel.', 'worm.',
       'mailbomb.', 'ftp_write.', 'teardrop.', 'land.', 'sqlattack.',
       'snmpguess.'], dtype=object)

As you can see there are labels in test set that are not present in train set.

How can I encode these labels so for example value normal be equal to 1 in both dataframes?
What should I do with labels from test set that are not present in train set, If I have to remove them how to do it?

Combine both labels lists, train a labelEncoder using sklearn and apply the trained model on each list separately: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html — Mohamed Ali JAMAOUI, Oct 04 '19 at 08:55
Possible duplicate of [Sklearn Label Encoding multiple columns pandas dataframe](https://stackoverflow.com/questions/44474570/sklearn-label-encoding-multiple-columns-pandas-dataframe) — Karan Sethi, Oct 04 '19 at 08:56
@KaranSethi thats multiple columns. I want 2 dataframes not columns. — j doe, Oct 04 '19 at 09:00
@MohamedAliJAMAOUI good Idea can you give me the code for that? — j doe, Oct 04 '19 at 09:01
@jdoe you have to try first and people can help you when you are bocked. Notice than in pandas you can concatenate two dataframe into a single one using pd.concat (check examples in the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) .. then trained your labelEncoder, then apply it on the original two separate dataframes. — Mohamed Ali JAMAOUI, Oct 04 '19 at 09:23

How to label encode 2 pandas dataframes?

0 Answers0