How to assign different values from a string to new column?

Question

Dialog_act is my label

I need to assign int values, like (inform_pricerange=1, inform_area=2, request_food=3, inform_food=4...) The goal is to look like this:

CSV (5 rows):

"transcript_id  who transcript  dialog_act    
0   USR  I need to find an expensive restauant that's in the south section of the city.     inform_pricerange; inform_area;    
1   SYS  There are several restaurants in the south part of town that serve expensive food. Do you have a cuisine preference?   inform_pricerange; inform_area; request_food;    
2   USR  No I don't care about the type of cuisine.     inform_food;    
3   SYS  Chiquito Restaurant Bar is a Mexican restaurant located in the south part of town.     inform_name; inform_area; inform_food;    
4   USR  What is their address?     request_address;    
5   SYS  There address is 2G Cambridge Leisure Park Cherry Hinton Road Cherry Hinton, it there anything else I can help you with?   inform_address;"

How can i do that?

Thanks in advice

Can you post the code to reproduce the dataframe, [as opposed to an image of text](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question)? — BrokenBenchmark, Jan 02 '22 at 20:31

Corralien · Accepted Answer · 2022-01-02T21:19:41.803

Update

I want to define the values, not in ascending order

vals_to_replace = {'inform_pricerange': 1, 'inform_area': 2, 'request_food': 3,
                   'inform_food': 4, 'inform_name': 5, 'request_address': 6,
                   'inform_address': 7}

df['dialog_act'] = df['dialog_act'].str.strip(';').str.split('; ').explode() \
                      .map(vals_to_replace).astype(str) \
                      .groupby(level=0).apply(', '.join)
print(df)

# Output
  dialog_act
0       1, 2
1    1, 2, 3
2          4
3    5, 2, 4
4          6
5          7

Old answer

Try to explode your column into a list of scalar values and use pd.factorize

# Step 1: explode
df1 = df['dialog_act'].str.strip(';').str.split('; ').explode().to_frame()

# Step 2: factorize
df['dialog_act'] = df1.assign(dialog_act=pd.factorize(df1['dialog_act'])[0] + 1) \
                      .astype(str).groupby(level=0)['dialog_act'].apply(', '.join)

Output:

>>> df
  dialog_act
0       1, 2
1    1, 2, 3
2          4
3    5, 2, 4
4          6
5          7

>>> df1
          dialog_act
0  inform_pricerange
0        inform_area
1  inform_pricerange
1        inform_area
1       request_food
2        inform_food
3        inform_name
3        inform_area
3        inform_food
4    request_address
5     inform_address

not like that, because i want to define the values, not in ascending order — dlmartins, Jan 02 '22 at 20:58

score 0 · Answer 2 · edited Jan 02 '22 at 20:51

0

I think you can use Pandas dataframe.replace() method. Firstly, convert your table to Pandas Dataframe. Then ,

vals_to_replace = {'inform_pricerange':1, 'inform_area':2, 'request_food':3, 'inform_food': 4}
your_df = your_df.replace({'your_label':vals_to_replace})

I saw a similar question about pandas replace multiple values one column .

edited Jan 02 '22 at 20:51

Dharman

30,962
25
85
135

answered Jan 02 '22 at 20:46

cbugrakaya

1
1

thanks, but this didn't work, because i have more than 1 label per row – dlmartins Jan 02 '22 at 20:58

How to assign different values from a string to new column?

2 Answers2