Problem Context
I am trying to create a chat log dataset from Whatsapp chats. Let me just provide the context of what problem I am trying to solve. Assume message to be M
and response to be R
. The natural way in which chats happen is not always alternate, for e.g. chats tend to happen like this
[ M, M, M, R, R, M, M, R, R, M ... and so on]
I am trying to concatenate continuously occurring strings of M's and R's. for the above example, I desire an output like this
Desired Output
[ "M M M", "R R", "M M" , "R R", "M ... and so on ]
An Example of Realistic Data:
Input --> ["M: Hi", "M: How are you?", "R: Heyy", "R: Im cool", "R: Wbu?"] (length=5) Output --> ["M: Hi M: How are you?", "R: Heyy R: Im cool R: Wbu?"] (length = 2)
Is there a fast and more efficient way of doing this? I have already read this Stackoverflow link to solve this problem. But, I didn't find a solution there.
So far, this is what I have tried.
final= []
temp = ''
change = 0
for i,ele in enumerate(chats):
if i>0:
prev = chats[i-1][0]
current = ele[0]
if current == prev:
continuous_string += chats[i-1]
continue
else:
continuous_string += chats[i-1]
final.append(temp)
temp = ''
Explanation of my code: I have chats
list in which the starting character of every message is 'M' and starting character of every response is 'R'. I keep track of prev
value and current
value in the list, and when there is a change (A transition from M -> R or R -> M), I append everything collected in the continuous_string
to final
list.
Again, my question is: Is there a shortcut in Python or a function to do the same thing effectively in less number of lines?