Problem with string manipulation in python

Question

I have been working on a python Data Visualization project of WhatsApp Chat. I have a string like this.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'

And I want to break it down to exactly like this.

['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

I have tried doing with split() function, but I can't seem to get this exact same thing. Also first time field will vary so length of that field might not be the same every time.

I would appriciate some help. Thanks.

maybe `[line[:line.index(']')+1], line[line.index(']')+2:]]` — FObersteiner, May 14 '20 at 13:12

Wariored · Answer 1 · 2020-05-14T13:25:00.713

1

try this:

r = line.split(']', 1)
r[0] += ']'

edited May 14 '20 at 13:25

answered May 14 '20 at 13:08

Wariored

1,303
14
25

1

Also if message contains any ']' then also this would might fail. – ghostshelltaken May 14 '20 at 13:23
@ghostshelltaken it will split on first occurence – Wariored May 14 '20 at 13:25

J. Question · Answer 2 · 2020-05-14T13:28:50.433

1

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
reslist =line.split(']',1)
reslist[0] += "]" # needed because split removes delimiter
reslist[1] = reslist[1].lstrip()
print(reslist) # ['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

edited May 14 '20 at 13:28

answered May 14 '20 at 13:12

J. Question

31
4

1

This fails if the message contains any ']'. – ghostshelltaken May 14 '20 at 13:26

Ocaso Protal · Accepted Answer · 2020-05-20T13:37:57.353

Find the first occurrence of ] and use that for slicing:

[line[:line.find(']')+1],line[line.find(']')+2:]]

BTW: It should be faster to use a helper variable for the find result, which might be better for you when you are doing DataViz:

f=line.find(']')
[line[:f+1],line[f+2:]]

Results from timeit:

>>> import timeit
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; [line[:line.find(']')+1],line[line.find(']')+2:]]")
0.33965302700016764
>>> timeit.timeit("line = '[14/11/18, 11:47:26 PM] Chaitanya: Yeah, Lets go to the movies [to] night'; f=line.find(']'); [line[:f+1],line[f+2:]]")
0.21619235799971648

Yes, This is perfect. Thanks – ghostshelltaken May 14 '20 at 13:24 — ghostshelltaken, May 14 '20 at 13:24

luigigi · Answer 4 · 2020-05-14T13:35:10.813

1

import re
re.split(r'(?<=\])\s', line, 1)
['[14/11/18, 2:47:26 PM]', 'Chaitanya: Yeah, Lets go to the movies to night']

It will split at whitespace if there is a bracket in front of it and only split once.

\s matches any whitespace
(?<=\]) is a look behind to look for ] (denoted as \])

edited May 14 '20 at 13:35

answered May 14 '20 at 13:14

luigigi

4,146
1
13
30

Thanks this works and I would appreciate if you would have shown how this works since I am not familier with `import re`. – ghostshelltaken May 14 '20 at 13:21
@ghostshelltaken i gave a short explanation and shortend the regex. maybe it helps a bit – luigigi May 14 '20 at 13:35
This helps, Thanks. I will aslo look into `re`. – ghostshelltaken May 14 '20 at 13:38

score 0 · Answer 5 · answered May 14 '20 at 13:37

0

I guess we take for granted that the format of the date is a constant thus we have a max length of it of 22.

line = '[14/11/18, 2:47:26 PM] Chaitanya: Yeah, Lets go to the movies to night'
loc = line.find(']',0,22)   # find its location => 21 

our_result = [
    line[0:loc+1], line[loc+2:]
]

answered May 14 '20 at 13:37

khamlichi.khalil

136
6

The format of the date is not constant: `[14/11/18, 2:47:26 PM]` or `[14/11/18, 10:47:26 PM]` – Ocaso Protal May 14 '20 at 14:01
when I said constant I meant length will not be larger than 22 – khamlichi.khalil May 15 '20 at 11:19
Yes, but your code is not working when the time is bigger than 9: find will return -1, just try it. You must either set the end parameter of `find` to 23 or drop the start and end parameter completely, because find will return the first occurrence. – Ocaso Protal May 15 '20 at 11:23
just add a simple if to check for it: if loc == -1: loc = line.find(']',0,23) – khamlichi.khalil May 20 '20 at 13:12

Problem with string manipulation in python

5 Answers5