I have a pd.dataframe with a cell containing lots of information, separated by some custom delimiters. I want to split this information into separate columns. Sample cell looks like this:
price<=>price<br>price<=>3100<br>price[currency]<=>PLN<br>rent<=>price<br>rent<=>600<br>rent[currency]<=>PLN<br>deposit<=>price<br>deposit<=><br>deposit[currency]<=><br>m<=>100<br>rooms_num<=>3<br>building_type<=>tenement<br>floor_no<=>floor_2<br>building_floors_num<=>4<br>building_material<=>brick<br>windows_type<=>plastic<br>heating<=>gas<br>build_year<=>1915<br>construction_status<=>ready_to_use<br>free_from<=><br>rent_to_students<=><br>equipment_types<=><br>security_types<=><br>media_types<=>cable-television<->internet<->phone<br>extras_types<=>balcony<->basement<->separate_kitchen
You can notice that at the end of this example there are also '<->' separators, separating some features within one column. I am ok with keeping them inside one column for now.
So my Dataframe looks somewhat like this:
A B
0 1 price<=>price<br>price<=>3100<br>(...)
1 2 price<=>price<br>price<=>54000<br>(...)
2 3 price<=>price<br>price<=>135600<br>(...)
So the pattern I can see is that:
column names are in between: '< br >' and <=>
values are in between: <=> and '< br >'
Is there any smooth way to do this in python? Ideally, I would like to have a solution that splits and puts all values into columns. I could do the column names manually then.
The desired output would be like this:
A price price[currency] rent (...)
0 1 3100 PLN 600 (...)
1 2 54000 CZK 1000 (...)
2 3 135600 EUR 8000 (...)