strip out part of a string between reocurring characters using python

Question

Using Python I am currently trying to strip out part of a string that occurs between two characters. The string can be different lengths, so character counting will not work. An example of what I am looking for would be:

172.-.221 - - [07/-20-:16:36:27 -0500] Firefox/17.0" ** 0 s/ 950 ms **

The desired section of the string is 0 s/ 950 ms, and I have noticed that it occurs between the pairs of double asterisks (** **) consistently.

How would I grab just the part of the string between the two double asterisks (**)? How would I either output that to the screen or save that to a file?

Also, you might take a look at [this excellent answer](http://stackoverflow.com/a/9891784/264775) for a similar case (the `**` was equivalent to brackets). — thegrinner, Dec 11 '13 at 21:06

Steve Barnes · Answer 1 · 2013-12-11T21:56:08.650

3

This is exactly the sort of thing that re is made for. e.g.

import re

TheString = '172.-.221 - - [07/-20-:16:36:27 -0500] Firefox/17.0" ** 0 s/ 950 ms **'

wc = re.compile(r'\*\*(.*)\*\*')
matches = wc.findall(TheString)

#returns ['0 s/ 950 ms ']

edited Dec 11 '13 at 21:56

answered Dec 11 '13 at 21:05

Steve Barnes

27,618
6
63
73

You can also rewrite the second line in order take the spaces into account. wc = re.compile(r'\\*\\* (.*) \\*\\*') – Raydel Miranda Dec 11 '13 at 21:20
You can if desired - personally I would us `r'\*\* *(.*) *\*\*' so as to allow for a variable number of spaces. – Steve Barnes Dec 11 '13 at 21:34
Yes you will match any number of spaces, and all of them will be collected inside "matches" variable. That's not what user want. (According the question). – Raydel Miranda Dec 11 '13 at 21:46
Oops use `r'\*\* *(.*) *\*\*'` lost a star - the match[1] will only give the part that excludes the **s and spaces. – Steve Barnes Dec 11 '13 at 21:50
You know what? I haven't tested ... but, that last pattern you post, don't look a little bit ambiguous to you? What if you change the last starts you add by +. Something like: r'\*\* +(.*) +\*\*' – Raydel Miranda Dec 11 '13 at 22:18
That would force a space be required rather than allowing 0 or more + you need to escape the *s to match them. – Steve Barnes Dec 11 '13 at 22:34

score 2 · Answer 2 · answered Dec 11 '13 at 21:05

2

>>> s='172.-.221 - - [07/-20-:16:36:27 -0500] Firefox/17.0" ** 0 s/ 950 ms **'
>>> s.split('**')[1].strip()
'0 s/ 950 ms'

answered Dec 11 '13 at 21:05

ndpu

22,225
6
54
69

score 1 · Answer 3 · answered Dec 11 '13 at 21:05

1

You could use str.split to extract it:

myString.split("**")[1]

This creates a list of strings by splitting the string at each appearance of "**", then takes the second item, index 1, from that list.

answered Dec 11 '13 at 21:05

jonrsharpe

115,751
26
228
437

score 1 · Answer 4 · answered Dec 11 '13 at 21:18

1

This is good as well :)

>>> import re
>>> string = '172.-.221 - - [07/-20-:16:36:27 -0500] Firefox/17.0" ** 0 s/ 950 ms **'
>>> re.search('\*{2}(.+)\*{2}', string).group(1)
' 0 s/ 950 ms '

answered Dec 11 '13 at 21:18

GMPrazzoli

239
1
6

score 0 · Answer 5 · answered Dec 11 '13 at 21:04

0

You can use regular expressions (the sub method). Here is a good tutorial by Google: https://developers.google.com/edu/python/regular-expressions

answered Dec 11 '13 at 21:04

Ali Rasim Kocal

528
3
14

strip out part of a string between reocurring characters using python

5 Answers5