1

Dialog_act is my label

enter image description here

I need to assign int values, like (inform_pricerange=1, inform_area=2, request_food=3, inform_food=4...) The goal is to look like this:

1,2    
1,2,3    
4    
5,2,4    
6

CSV (5 rows):

"transcript_id  who transcript  dialog_act    
0   USR  I need to find an expensive restauant that's in the south section of the city.     inform_pricerange; inform_area;    
1   SYS  There are several restaurants in the south part of town that serve expensive food. Do you have a cuisine preference?   inform_pricerange; inform_area; request_food;    
2   USR  No I don't care about the type of cuisine.     inform_food;    
3   SYS  Chiquito Restaurant Bar is a Mexican restaurant located in the south part of town.     inform_name; inform_area; inform_food;    
4   USR  What is their address?     request_address;    
5   SYS  There address is 2G Cambridge Leisure Park Cherry Hinton Road Cherry Hinton, it there anything else I can help you with?   inform_address;"

How can i do that?

Thanks in advice

Corralien
  • 109,409
  • 8
  • 28
  • 52
dlmartins
  • 51
  • 9

2 Answers2

1

Update

I want to define the values, not in ascending order

vals_to_replace = {'inform_pricerange': 1, 'inform_area': 2, 'request_food': 3,
                   'inform_food': 4, 'inform_name': 5, 'request_address': 6,
                   'inform_address': 7}

df['dialog_act'] = df['dialog_act'].str.strip(';').str.split('; ').explode() \
                      .map(vals_to_replace).astype(str) \
                      .groupby(level=0).apply(', '.join)
print(df)

# Output
  dialog_act
0       1, 2
1    1, 2, 3
2          4
3    5, 2, 4
4          6
5          7

Old answer

Try to explode your column into a list of scalar values and use pd.factorize

# Step 1: explode
df1 = df['dialog_act'].str.strip(';').str.split('; ').explode().to_frame()

# Step 2: factorize
df['dialog_act'] = df1.assign(dialog_act=pd.factorize(df1['dialog_act'])[0] + 1) \
                      .astype(str).groupby(level=0)['dialog_act'].apply(', '.join)

Output:

>>> df
  dialog_act
0       1, 2
1    1, 2, 3
2          4
3    5, 2, 4
4          6
5          7

>>> df1
          dialog_act
0  inform_pricerange
0        inform_area
1  inform_pricerange
1        inform_area
1       request_food
2        inform_food
3        inform_name
3        inform_area
3        inform_food
4    request_address
5     inform_address
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

I think you can use Pandas dataframe.replace() method. Firstly, convert your table to Pandas Dataframe. Then ,

vals_to_replace = {'inform_pricerange':1, 'inform_area':2, 'request_food':3, 'inform_food': 4}
your_df = your_df.replace({'your_label':vals_to_replace})

I saw a similar question about pandas replace multiple values one column .

Dharman
  • 30,962
  • 25
  • 85
  • 135