0

I have been working on a python Data Visualization project of WhatsApp Chat. I have a string like this.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'

And I want to break it down to exactly like this.

['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

I have tried doing with split() function, but I can't seem to get this exact same thing. Also first time field will vary so length of that field might not be the same every time.

I would appriciate some help. Thanks.

5 Answers5

1

try this:

r = line.split(']', 1)
r[0] += ']'
Wariored
  • 1,303
  • 14
  • 25
1
line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
reslist =line.split(']',1)
reslist[0] += "]" # needed because split removes delimiter
reslist[1] = reslist[1].lstrip()
print(reslist) # ['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

1

Find the first occurrence of ] and use that for slicing:

[line[:line.find(']')+1],line[line.find(']')+2:]]

BTW: It should be faster to use a helper variable for the find result, which might be better for you when you are doing DataViz:

f=line.find(']')
[line[:f+1],line[f+2:]]

Results from timeit:

>>> import timeit
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; [line[:line.find(']')+1],line[line.find(']')+2:]]")
0.33965302700016764
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; f=line.find(']'); [line[:f+1],line[f+2:]]")
0.21619235799971648
Ocaso Protal
  • 19,362
  • 8
  • 76
  • 83
1
import re
re.split(r'(?<=\])\s', line, 1)
['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

It will split at whitespace if there is a bracket in front of it and only split once.

  • \s matches any whitespace
  • (?<=\]) is a look behind to look for ] (denoted as \])
luigigi
  • 4,146
  • 1
  • 13
  • 30
0

I guess we take for granted that the format of the date is a constant thus we have a max length of it of 22.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
loc = line.find(']',0,22)   # find its location => 21 

our_result = [
    line[0:loc+1], line[loc+2:]
]