python string manipulation, finding a substring within a string

Question

I am trying to find a substring within a larger string in python. I am trying to find the text present after the string "Requests per second:" is found. It seems my knowledge of python strings and python in general is lacking.

My error is on the 3rd line of code minusStuffBeforeReqPer = output[reqPerIndx[0], len(output)], I get the error that without the [0] on reqPerIndx I am trying to access a tuple, but with it I get the error that I int object has no attribute __getitem__. I am trying to find the index of the start of the reqPerStr in the output string.

The code

#output contains the string reqPerStr.
reqPerStr = "Requests per second:"
reqPerIndx = output.find(reqPerStr)
minusStuffBeforeReqPer = output[reqPerIndx[0], len(output)]
eolIndx = minusStuffBeforeReqPer.find("\n")
semiColIndx = minusStuffBeforeReqPer.find(":")
instanceTestObj.reqPerSec = minusStuffBeforeReqPer[semiColIndx+1, eolIndx]

I get the feeling this isn't the best way to do this. If you're trying to find a substring that appears after a known substring, you should use regex lookbehinds. — Adam Smith, Feb 03 '14 at 18:16
the find() method returns an integer representing an index. You are attempting reqPerIndx[0], which makes no sense. — Totem, Feb 03 '14 at 18:18
If you look to the right of your question on this page, you will see a column of related questions. Some of them have the answers you seek. The same list would have come up while you were writing your question. — , Feb 03 '14 at 18:26

senshin · Accepted Answer · 2014-02-03T18:21:05.400

You must use output[begin:end], not output[begin, end] (that's just how the syntax for slicing ordinary strings/lists/etc works). So:

minusStuffBeforeReqPer = output[reqPerIndx:len(output)]

However, this is redundant. So you should instead probably do this:

minusStuffBeforeReqPer = output[reqPerIndx:]

By omitting the end part of the slice, the slice will go all the way to the end of output.

You get a error about accessing a tuple without the [0] because you have passed a tuple (namely (reqPerIndx, len(output)) to the slicing [...]), and you get an error about int having no __getitem__ because when you write reqPerIndx[0], you are trying to get the 0th element of reqPerIndx, which is an integer, but there is of course no such thing as the "0th element of an integer", because integers do not have elements.

As @AshwiniChaudhary points out in the comments, str.find will return -1 if the substring is not found. If you are certain that the thing you're looking for will always be found somewhere in output, I suppose you don't need to handle the -1 case, but it might be a good idea to do so anyway.

reqPerIndx = output.find(reqPerStr)
if reqPerIndx != -1:
    minusStuffBeforeReqPer = ...
    # etc
else:
    # handle this case separately

You might have better luck with regexes. I don't know what output looks like, so I sort of just guessed - you should adapt this to match whatever you have in output.

>>> import re
>>> re.findall(r'(?:Requests per second:)\s*(\d+)', "Requests: 24")
[]
>>> re.findall(r'(?:Requests per second:)\s*(\d+)', "Requests per second: 24")
['24']

Note that `str.find` returns -1 for missing sub-strings, that should be handled as well. — Ashwini Chaudhary, Feb 03 '14 at 18:17
I never thought of using regex, I'm not that well versed in it. But the given regex code, how does it read (find string "" and return ..)? — KDecker, Feb 03 '14 at 20:55
Take a look at http://regex101.com/r/aX9yI6 - it might help. Basically, the `(?:...)` means "look for `...` but don't capture it (i.e. return it in output)". The `\s*` means "look for any amount of whitespace". Finally, `(\d+)` means "look for one or more digits, and capture it (i.e. return it in output)". — senshin, Feb 03 '14 at 21:00

Loïc Faure-Lacroix · Answer 2 · 2014-02-03T18:45:50.403

You have the error on those two lines:

minusStuffBeforeReqPer = output[reqPerIndx[0], len(output)]
instanceTestObj.reqPerSec = minusStuffBeforeReqPer[semiColIndx+1, eolIndx]

You have to use the : to create a range. start:end.

You can omit the last parameter to get to the end or omit the first parameter to omit the begining. The parameters can be negative number too. Since find might return -1 you'll have to handle it differently because if the string isn't found, you'll end up with:

minusStuffBeforeReqPer = output[-1:]

Which is the last char in the string.

You should have code that looks like this:

#output contains the string reqPerStr.
reqPerStr = "Requests per second:"
reqPerIndx = output.find(reqPerStr)
if reqPerIndx != -1:
    minusStuffBeforeReqPer = output[reqPerIndx[0]:]
    eolIndx = minusStuffBeforeReqPer.find("\n")
    semiColIndx = minusStuffBeforeReqPer.find(":")

    if eolIndx > semiColIndx >= 0:

        instanceTestObj.reqPerSec = minusStuffBeforeReqPer[semiColIndx+1:eolIndx]

This is good but, you should definitely change the code with a regex. As I understand, you really want to match a string that starts with reqPerStr and ends with \n and get everything that is in between : and \n.

You could do that with such pattern:

"Requests per second:(.*)\n"

You'll end up with:

import re

reqPerIndx = output.find(reqPerStr)

match = re.match("Requests per second:(.*)\n", output)
if match:
    instanceTestObj.reqPerSec = match.group(1)

If you want to find all matches, you can do that:

for match in re.finditer("Requests per second:(.*)", output)
    instanceTestObj.reqPerSec = match.group(1)

python string manipulation, finding a substring within a string

2 Answers2