How I can split a sentence containing values into columns with its value in python?

Question

I need to convert this string into rooms and their specific value with the help of python code. I tried with

df1[['Master Bedroom', 'bedroom','bedroom','bedroom','study room']] = df['Area(sq.ft)'].str.split(',',1, expand=True)

but unfortunately its saying

ValueError: Columns must be same length as key

Below is the input:

0 | Area(sq.ft)
-----------------------------------------------------------------------
1 | Master Bedroom, 181, Bedroom, 153.08, Bedroom, 143, Bedroom, 132, 
  | Study Room, 72.75
-----------------------------------------------------------------------
2 | Master Bedroom, 144.83, Bedroom, 126.88, Bedroom, 142.62

Below is the output:

| Master Bedroom |  Bedroom |  Bedroom |  Bedroom | Study Room
----------------------------------------------------------------
| 181            |  153.08  |  143     |  132     | 72.75
| 144.83         |  126.88  |  142.62  |  0       | 0

Please [edit] your question to include your sample input and expected output as text in the body of your question, not as an image or external link, to make a [mcve]. See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — G. Anderson, Nov 17 '21 at 18:06
You may need to use `apply()` to convert every text into `pd.Series` and then you can use `join()` and it will add `NaN` in missing places - and you can fill `NaN` with `0` - `.fillna(0)` — furas, Nov 18 '21 at 00:24

furas · Answer 1 · 2021-11-18T00:57:47.080

Problem is that split(',', 1,...) with parameter 1 gives only 2 columns.

But other problem is that split(',') gives list with 10 elements

["Master Bedroom", "181", "Bedroom", "153.08", "Bedroom", "143", "Bedroom", "132", "Study Room", "72.75"]

but you need only 5 elements

["181", "153.08", "143", "132", "72.75"]

You could create

df1[['name1', 'Master Bedroom', 'name2', 'Bedroom', 'name3', 'Bedroom', 'name4', 'Bedroom', 'name5', 'Study Room']] = ...

and later drop columns nameX but it may put values in wrong places if you have less Bedrooms.

You could use .apply(function) to run more complex function which splits text and creates Series with result[name] = value. And this will create only 5 columns and it will automatically add NaN in empty places. But later you can put 0 using .fillna(0)

But there is other problem - you have 3 columns with the same name Bedroom and it makes problem to create Series with 3 values (like in dictionary you can have many keys with the same name).

I would add numbers to names to create columns Bedroom 1, Bedroom 2, Bedroom 3,

import pandas as pd

df = pd.DataFrame({
'Area(sq.ft)': [
'Master Bedroom, 181, Bedroom, 153.08, Bedroom, 143, Bedroom, 132, Study Room, 72.75', 
'Master Bedroom, 144.83, Bedroom, 126.88, Bedroom, 142.62',
]
})

#print(df)

def convert(text):
    
    result = pd.Series(dtype=object)

    items = text.split(', ')
    
    all_names  = items[0::2]
    all_values = items[1::2]
    
    number = 1
    
    for name, value in zip(all_names, all_values):
        if name == 'Bedroom':
            name = f'{name} {number}'
            number += 1
    
        result[name] = value
    
    return result

new_df = df['Area(sq.ft)'].apply(convert)
new_df = new_df.fillna(0)

print(new_df)

Result:

  Master Bedroom Bedroom 1 Bedroom 2 Bedroom 3 Study Room
0            181    153.08       143       132      72.75
1         144.83    126.88    142.62         0          0

How I can split a sentence containing values into columns with its value in python?

1 Answers1