0

Hello, I have the following function:

def width(input,output,attr):
    import re
    input = input.strip()
    if re.search(attr, input):
        k = input.find(attr)
        for i in input:
            if i == attr[0]:
                j = k + len(attr)+1
                while ((j <= len(input)) |  (j != ' ') | (input[j+1] != "'")):
                    j = j + 1
                    #print j, output, input[j], len(input), k
                    output = output+input[j]
                break
            k = k + 1
    return output

print width('a=\'100px\'','','a')

I get always get the following error:

Traceback (most recent call last):
  File "table.py", line 45, in <module>
    print width(split_attributes(w,'','<table.*?>'),'','width')
  File "table.py", line 24, in width
    while ((j <= len(input)) |  (j != ' ') | (input[j+1] != "'")):
IndexError: string index out of range

I have tried using or instead | but it didn't work!

Jason
  • 542
  • 5
  • 24
Elteroooo
  • 2,913
  • 3
  • 33
  • 40
  • Why are you using `re` to find a substring? If you are doing simple searches, use `in` like: `if attr in input:`. – D K Aug 24 '11 at 01:15
  • 1
    This can't possibly be right: `while ((j <= len(input)) | (j != ' ') | (input[j+1] != "'")):`. `j` is an integer index into `input` but you are comparing it to a space. – hughdbrown Aug 24 '11 at 01:22

6 Answers6

1
while ((j <= len(input)) |  (j != ' ') | (input[j+1] != "'")):

0) You should be using or.

1) You should not use input as a variable name; it hides a built-in function.

2) j is an integer, so it can never be equal to ' ', so that test is useless.

3) j <= len(input) passes when j == len(input). The length of a string is not a valid index into the string; indices into a string of length N range from 0 to (N - 1) (you can also use negative numbers from -1 to -N, to count from the end). Certainly j+1 doesn't work either.

4) I can't tell what the heck you are actually trying to do. Could you explain it in words? As stated, this isn't a very good question; making the code stop throwing exceptions doesn't mean it's any closer to working correctly, and certainly doesn't mean it's any closer to being good code.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
0

It looks like j+1 is a number greater than or equal to the length of the string you have (input). Make sure you structure your while loop so that j < (len(input) - 1) is always true and you won't end up with the string index out of range error.

Cassidy Laidlaw
  • 1,318
  • 1
  • 14
  • 24
0

if j >= len(input) - 1 then j+1 will most certainly be out of bounds. Also, use or and not |.

Andrew White
  • 52,720
  • 19
  • 113
  • 137
0

You get an error IndexError: string index out of range. The only index reference is in part input[j+1]. Situation when j = len(input) will cause an error, as the following code demonstrates:

>>> input = "test string"
>>> len(input)
11
>>> input[11]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> input[10]
'g'

If you try to reference element number j+1, then condition j < ( len(input) - 1 ) needs to be satisfied.

artdanil
  • 4,952
  • 2
  • 32
  • 49
0

When using != in if statements, make sure that or is actually what you need. Here's an example:

import random
a = random.randint(1, 10)
b = random.randint(1, 10)
c = random.randint(1, 10)
if a != 1 or b != 1 or c != 1:
    print "None of the values should equal 1"
    # The interpreter sees `a != 1`.
    # If `a` is not equal to 1 the condition is true, and this code gets excecuted.
    # This if statement will return true if ANY of the values are not equal to 1.
if a != 1 and b != 1 and c != 1:
    print "None of the values are equal to 1" # True
    # This if statement will return true if ALL of the values are not equal to 1.

This is a hard thing to understand at first (I made this mistake all the time), but if you practise it a bit, it will make perfect sense.

So, to get your code working, replace those |s with and, and it will work (and stick with the keywords or and and unless you specifically need boolean or or and (|/&):

while ((j <= len(input)) and  (j != ' ') and (input[j+1] != "'")):

and the output is:

100px
D K
  • 5,530
  • 7
  • 31
  • 45
0

Not the solution to your problem. Code that probably does what you are aiming for.

Just use a single regular expression:

import re

def width(input, attr):
    """
    >>> width("a='100px'", 'a')
    '100px'
    """
    regex = re.compile(attr + r""".*?'(?P<quoted_string>[^']+)'""")
    m = regex.match(input.strip())
    return m.group("quoted_string") if m else ''

if __name__ == '__main__':
    import doctest
    doctest.testmod()

This code skips attr and searches for a quoted string that follows. (?P<quoted_string>[^']+) captures the quoted string. m.group("quoted_string") recovers the quoted string.

hughdbrown
  • 47,733
  • 20
  • 85
  • 108
  • This only works properly if there is a single parameter. In the question, only one parameter is specified, but (assuming this is HTML parsing) HTML tags can contain multiple parameters. So `width("onetag='100px' twotag='30px'", 'twotag')` returns `''`. – D K Aug 24 '11 at 02:00
  • Yeah, I am doing my best to guess what he is really doing and point him in the right/a better direction. And if he is really doing HTML without using BeautifulSoup / lxml / some other HTML parser, then all bets are off. Obligatory link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – hughdbrown Aug 24 '11 at 02:10