0

I have a column in a pandas dataframe that contains a list of strings. Each string is seperated by a comma.

The list in a row looks something like this:

list = ['banana bread is yummy', 'i hate to have some more bread, can't we eat apples?', 'apples are not good for you, they make you hungry']

I've been trying to split the list in each row of my column based on regex to get the following output:

banana bread is yummy
i hate to have some more bread, can't we eat apples?
apples are not good for you, they make you hungry

but when I use

s = df.assign(conversation=df['conversation'].str.split(',')).explode('conversation')

The whole list get split by commas, regardless of whether they are in the same string or not. Giving me this output:

banana bread is yummy
i hate to have some more bread
can't we eat apples?
apples are not good for you 
they make you hungry

Any advice on how to use regex for this at all? I tried a couple of things but just get very random results.

EDIT:

Another method I tried was this:

df['conversation'] = df['conversation'].str.strip('[]')

I first removed the square brackets from each row and then split everything. Whilst this method works, it leaves me with random empty rows.

msa
  • 693
  • 6
  • 21

1 Answers1

0

I was just able to answer my own question based on this response here :-)

s = df.assign(conversation =df['conversation'].str.split(",(?=(?:[^\']*\'[^\']*\')*[^\']*$)")).explode('conversation')
msa
  • 693
  • 6
  • 21