How to split column values of a panda data frame into rows separated by “,”

Question

I am trying to separate the column values separated by "," separator of a panda dataframe.

The original data Original panda dataframe

The desired output Desired output

I have tried several ways.

Explode/stack a Series of strings

newdf['Month'] = newdf['Month'].apply(list)

using the above code I am getting [j,a,n,,f,e,b] and then I have used

pd.Dataframe({'Month':np.concatenate(newdf['Month'].values), 'cust.no':newdf['cust.no'].repeat(newdf['cust no.'].apply(len))})

The output is each letter is coming in separate rows. As a result, the row numbers are not matching with "cust no." and I am getting error.

I know there are several functions available but I couldn't one that can efficiently break down the values.

You posted this question earlier today. It was, and still is, a duplicate. Either way, in the future, please post dataframes as images, not text — user3483203, Aug 20 '18 at 21:28
The below link has solved my problem. Very very useful. https://stackoverflow.com/questions/50082449/splitting-multiple-columns-on-a-delimiter-into-rows-in-pandas-dataframe — Deya, Aug 22 '18 at 03:11

Joska · Answer 1 · 2018-08-20T22:30:29.050

You can always just use a regex (regular expression) to identify all text before the comma.

Assuming your original dataframe is called data, meaning your months column is data['Months'], you can use the regular expression r'(.+?),' to select everything before the comma.

data['Months'] = data['Months'].str.extract(r'(.+?),', expand=True)

You can always test regex at https://pythex.org/. Try entering your months column in the test string box, and (.+?), as the regular expression.

score 0 · Answer 2 · answered Aug 21 '18 at 02:14

0

`Setup`

df = pd.DataFrame({'id': [1,2,3,4], 'month': ['Jan,Fev', 'Feb,July', 'Jun,Aug', 'July,Mar']})

    id  month
0   1   Jan,Fev
1   2   Feb,July
2   3   Jun,Aug
3   4   July,Mar

`str.split`+`pd.DataFrame()`+`stack`

df = df.set_index('id')
pd.DataFrame(df.month.str.split(',').to_dict()).T.stack().reset_index(level=0, name='month')

    level_0 month
0   1       Jan
1   1       Fev
0   2       Feb
1   2       July
0   3       Jun
1   3       Aug
0   4       July
1   4       Mar

answered Aug 21 '18 at 02:14

rafaelc

57,686
15
58
82

Thank you. I want to do on all the columns at one time otherwise I am getting error due to not matching row numbers. So, I am using the following code pd.DataFrame(new.apply(lambda x: x.to_dict("series").str.split(",").T.stack().reset_index(),axis=1,raw = False)). However, I am getting this error- ("unsupported type: ", . Would you like to share your thoughts. – Deya Aug 21 '18 at 19:03

How to split column values of a panda data frame into rows separated by “,”

2 Answers2

Setup

str.split+pd.DataFrame()+stack

`Setup`

`str.split`+`pd.DataFrame()`+`stack`