0

One solution would be to use pandas.DataFrame.apply. But is there a more efficient way?+ In the following pattern is applied in the examples: AA = 0.0, AB = 0.5, BB = 1.0.

Input Table

Index Col1 Col2
Sample1 AB BB
Sample2 AA AB

Output Table

Index Col1 Col2
Sample1 0.5 1.0
Sample2 0.0 0.5
import pandas as pd
table_input = pd.DataFrame({'Col1': ["AB", "BB"],
                          'Col2': ["AA", "AB"]},
                          index=['Sample1', 'Sample2'])
table_output = pd.DataFrame({'Col1': [0.5, 1.0],
                          'Col2': [0.0, 0.5]},
                          index=['Sample1', 'Sample2'])
# Please insert solution here...

Pm740
  • 339
  • 2
  • 12
  • I guess [`map`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html) can be useful here. – 9769953 Jul 25 '23 at 08:01
  • It looks like you are using a key to map to numeric values. Perhaps map. Is there also a reason why apply is not sufficient? – Jason Chia Jul 25 '23 at 08:02
  • do you define manually all `AA`/`AB`/`BB` (in which case use `table_output = table_input.replace({'AA': 0, 'AB': 0.5, 'BB': 1})`) or do you define `A` and `B` then consider `AB` the mean of `A` and `B`? – mozway Jul 25 '23 at 08:06
  • @JasonChia yes. I work with rather huge data. Apply would probably do the job just fine. But I'd like to use a faster way if possible. – Pm740 Jul 25 '23 at 08:18
  • @mozway the first option. There is no A or B. It is always AA, BB or AB. – Pm740 Jul 25 '23 at 08:20
  • @Pm740 OK (disappointed), then use `replace` (or `map` per column)… – mozway Jul 25 '23 at 08:24

0 Answers0