I have this dataframe:
df = pd.DataFrame([['137', 'earn'], ['158', 'earn'],['144', 'ship'],['111', 'trade'],['132', 'trade']], columns=['value', 'topic'] )
print(df)
value topic
0 137 earn
1 158 earn
2 144 ship
3 111 trade
4 132 trade
And I want an additional numeric column like this:
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
So basically I want to generate a column which encodes a string column to a numeric value. I implemented this solution:
topics_dict = {}
topics = np.unique(df['topic']).tolist()
for i in range(len(topics)):
topics_dict[topics[i]] = i
df['topic_id'] = [topics_dict[l] for l in df['topic']]
However, I am quite sure there is a more elegant and pandaic way to solve this but I couln't find something on Google or SO. I read about pandas' get_dummies but this creates multiple columns for each different value in the original column.
I am thankful for any help or pointer in a direction!