0

I am currently working on a pandas dataframe. I am reformatting the data so that it is easier to understand when running analysis. The default data in the columns is a string that looks like this something | something. An example is Accident | repairable-damage.

I want to create two new columns in the dataframe that split the string into 2 different strings and assign different parts of the split string to different columns.

Incident_Category            | 
------------------------------
Accident | repairable-damage
Accident | repairable-damage
Accident | hull-loss

This is what the expected output is:

Incident_Category            | Incident_Type | Incident_Damage |
----------------------------------------------------------------
Accident | repairable-damage | Accident      | repairable-damage
Accident | repairable-damage | Accident      | repairable-damage
Accident | hull-losss        | Accident      | hull-losss

This is the code that I currently have:

print(dropped_dataset['Incident_Category'].unique())
dropped_dataset['Incident_type_array'] = dropped_dataset['Incident_Category'].str.split("|")
dropped_dataset['Incident_type'] = dropped_dataset['Incident_type_array'][0][0]
dropped_dataset['Incident_damage'] = dropped_dataset['Incident_type_array'][[1]]
dropped_dataset.head(7)

It is currently grabbing the first record and assigning the first rows details for the entire dataframe columns.

I want each rows Incident_Category to be split and assigned.

omar jandali
  • 87
  • 1
  • 12

1 Answers1

1

We can use pandas.Series.str.split:

dropped_dataset[['Incident_Type', 'Incident_Damage']] = dropped_dataset.Incident_Category.str.split(" | ", expand=True, regex=False)
Mohammad Ayoub
  • 379
  • 2
  • 9