According to a blog post written by Yoshoku, the author of Rumale machine learning library, you can do it like:
train_df['IsFemale'] = train_df['Sex'].map { |v| v == 'female' ? 1 : 0 }
Rumale's label encoder is also useful for the categorical variable.
require 'rumale'
encoder = Rumale::Preprocessing::LabelEncoder.new
labels = Numo::Int32[1, 8, 8, 15, 0]
encoded_labels = encoder.fit_transform(labels)
# Numo::Int32#shape=[5]
# [1, 2, 2, 3, 0]
Rumale::Preprocessing::OneHotEncoder
encoder = Rumale::Preprocessing::OneHotEncoder.new
labels = Numo::Int32[0, 0, 2, 3, 2, 1]
one_hot_vectors = encoder.fit_transform(labels)
# > pp one_hot_vectors
# Numo::DFloat#shape[6, 4]
# [[1, 0, 0, 0],
# [1, 0, 0, 0],
# [0, 0, 1, 0],
# [0, 0, 0, 1],
# [0, 0, 1, 0],
# [0, 1, 0, 0]]
But, conversion of Daru::Vector and Numo::NArray needs to use to_a
.
encoder = Rumale::Preprocessing::LabelEncoder.new
train_df['Embarked'] = encoder.fit_transform(train_df['Embarked'].to_a).to_a