Convert a list of strings to a pandas dataframe

Asked Oct 18 '19 at 10:58

Active Oct 18 '19 at 11:16

Viewed 33 times

I have a list of strings reflecting a conversation. I am trying to find a way to convert the conversation into a dataframe with columns for an index, the utterance (text) itself, and a speaker label column.

myconvo = ['Speaker1: this is one utterance', 
            'Speaker2: this is another utterance', 
            'Speaker1: this is a third utterance']

I assume that I will need to transform the list of strings into a list of lists, where each sub-list will comprise the speaker ID and the utterance.

So far I have used the below regular expression, but it is returning an extra blank object.

for i myconvo:
    a = re.split(r'(Speaker\d)', i, flags=re.MULTILINE)

['', 'Speaker1', ': this is one utterance']
['', 'Speaker2', ': this is another utterance']
['', 'Speaker1', ': this is a third utterance']

Worst case scenario I could just delete that first column, but I'm thinking there are clearly things I am doing that could be improved.

asked Oct 18 '19 at 10:58

cookie1986

5

Why not `s.split(": ", 1)`? – Wiktor Stribiżew Oct 18 '19 at 11:02
As I'm pretty new to regex, can you provide some context as to what the code is doing? – cookie1986 Oct 18 '19 at 11:05
It is necessary @DC_Liv – ansev Oct 18 '19 at 11:06
`s.split(": ", 1)` splits the string into 2 parts at the first occurrence of `:` + space – Wiktor Stribiżew Oct 18 '19 at 11:07
Thanks Wiktor. That's great. – cookie1986 Oct 18 '19 at 11:10

Convert a list of strings to a pandas dataframe

0 Answers0