What is the most elegant and efficient way to encoding strings to numeric values in pandas?

Question

One solution would be to use pandas.DataFrame.apply. But is there a more efficient way?+ In the following pattern is applied in the examples: AA = 0.0, AB = 0.5, BB = 1.0.

Input Table

Index	Col1	Col2
Sample1	AB	BB
Sample2	AA	AB

Output Table

Index	Col1	Col2
Sample1	0.5	1.0
Sample2	0.0	0.5

import pandas as pd
table_input = pd.DataFrame({'Col1': ["AB", "BB"],
                          'Col2': ["AA", "AB"]},
                          index=['Sample1', 'Sample2'])
table_output = pd.DataFrame({'Col1': [0.5, 1.0],
                          'Col2': [0.0, 0.5]},
                          index=['Sample1', 'Sample2'])
# Please insert solution here...

I guess [`map`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html) can be useful here. — 9769953, Jul 25 '23 at 08:01
It looks like you are using a key to map to numeric values. Perhaps map. Is there also a reason why apply is not sufficient? — Jason Chia, Jul 25 '23 at 08:02
do you define manually all `AA`/`AB`/`BB` (in which case use `table_output = table_input.replace({'AA': 0, 'AB': 0.5, 'BB': 1})`) or do you define `A` and `B` then consider `AB` the mean of `A` and `B`? — mozway, Jul 25 '23 at 08:06
@JasonChia yes. I work with rather huge data. Apply would probably do the job just fine. But I'd like to use a faster way if possible. — Pm740, Jul 25 '23 at 08:18
@mozway the first option. There is no A or B. It is always AA, BB or AB. — Pm740, Jul 25 '23 at 08:20
@Pm740 OK (disappointed), then use `replace` (or `map` per column)… — mozway, Jul 25 '23 at 08:24

What is the most elegant and efficient way to encoding strings to numeric values in pandas?

0 Answers0