0

I'm searching for solution how to replace several strings within one cell inside of data frame in Python-Pandas.

Each column has unique elements to be replaced based on legend that is already defined.

I've already find solution how to replace values within column, but result that I get replace only one string at time, and removing other. For example cell value: AA, BB, CC legend: AA - Level 1, BB - Level 2, CC - Level 3, DD - Level 4 result: Level 1.

Data set:
Field Name | Category 1 | Category 2
Test1        AA BB CC      LD DD
Test2        BB CC         DD
Test3        AA            LD
Test4        AA BB DD      LD DD

Legend:
AA - Level 1, BB - Level 2, CC - Level 3, DD - Level 4
LD - High, DD - Low

I expect result to be combined with one cell, for example: Level 1; Level 2 while cell value was AA, BB

Erfan
  • 40,971
  • 8
  • 66
  • 78

1 Answers1

1

Use:

d = {'AA':'Level 1','BB':'Level 2','CC':'Level 3','DD':'Level 4','LD': 'High', 'DD' :'Low'}

regex = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df = df.apply(lambda x: x.str.replace(regex, lambda x: d[x.group()], regex=True))

print (df)

  Field Name               Category 1 Category 2
0      Test1  Level 1 Level 2 Level 3   High Low
1      Test2          Level 2 Level 3        Low
2      Test3                  Level 1       High
3      Test4      Level 1 Level 2 Low   High Low

If need apply solution only for one column:

df['Category 1'] = df['Category 1'].str.replace(regex, lambda x: d[x.group()], regex=True)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252