How to account for unexpected data when trying to split values

Question

I have the following code snippet which is part of a larger chunk of code to extract image filenames from links.

        for a in soup.find_all('a', href=True):
            url = a['href']
            path, file = url.rsplit('/', 1)
            name, ext = file.rsplit('.', 1)

It works very well, however on occasion the data (which comes from an external source) will have errors.

Specifically, the last line in the snippet above will throw an error that:

    name, ext = file.rsplit('.', 1)
ValueError: not enough values to unpack (expected 2, got 1)

What is the best way to ignore this error (or lines containing input not as expected) and continue on to the next entry?

I would have thought a try and catch is the right approach here, but upon googling how to do that with this type of error I did not find anything.

Is it possible to use a try block to catch this type of error? If not, why not, and what is the better approach?

C. Fennell · Answer 1 · 2019-10-11T22:39:46.383

1

I would not use a try-except in this case, since you have no use for the except part. You're not going to be processing the file if you do encounter an error. Feel free to read up on try-excepts, there are tons of questions on stack overflow about it to see what you think will work best for you.

It sounds like you don't understand the error. The error is because you must have a filename that doesn't have an extension. so when you do rsplit, it only has 1 value. For example:

file = 'babadabooey'
print(file.rsplit('.', 1))

Out: ['babadabooey']

So if you try to unpack that into two values, you're going to get an error. I assume, most of the time you are expecting something like

file = 'babadabooey.exe'
print(file.rsplit('.', 1))

Out: ['babadabooey', '.exe']

So if you try to unpack that value into two values, you're fine. How I would proceed is with an if statement, that way you only try to split it IF '.' is in the file var.

if '.' in file:
    name, ext = file.rsplit('.', 1)

edited Oct 11 '19 at 22:39

answered Oct 11 '19 at 22:28

C. Fennell

992
12
20

Why is a try block almost never recommend? I find it quite effective. I understand the error just not how to deal with it...if I get a value like 'babadabooey' I just want to ignore/skip it completely and move on. With an if block it won't be split but it will still be processed, correct? As where with a try block it would be skipped? – Jake Rankin Oct 11 '19 at 22:31
I'll edit the answer to be more clear about try-except. You have it backwards. with an if statement it will be completely skipped, with a try-except, it will execute your code, realize it didn't work and then do whatever you have in your except statement. You can add the "continue" keyword to an else or an except statement, as needed. – C. Fennell Oct 11 '19 at 22:36
Of course, in this case you're effectively scanning "file" for the '.' character twice if it _does_ exist. Presumably the case where there is a . is a common case, and thus you're greatly increasing the amount of scanning you are doing. (More about [runtime of the in operator](https://stackoverflow.com/questions/35220418/runtime-of-pythons-if-substring-in-string)). – robsiemb Oct 11 '19 at 22:43
@robsiemb yeah it definitely depends on what data they are expecting as inputs and how often there will/won't be a '.' – C. Fennell Oct 11 '19 at 22:44
Thank you for your answer, but I think in the greater context of what I am doing I need to use a try and except block. Thank you for your answer though! – Jake Rankin Oct 11 '19 at 23:50

robsiemb · Accepted Answer · 2019-10-11T23:08:43.753

Assuming all you need is to ignore the error, this try/except style should work for you:

for item in ['a.b.c', 'a.b', 'a', 'a.b.c']:
  try:
    path, file = item.rsplit('.',1)
    print("%s, %s" % (path, file))
  except ValueError:
    print("error with %s" % item)
    continue
  print("more work here!")

which gives the output:

a.b, c
more work here!
a, b
more work here!
error with a
a.b, c
more work here!

Of course, this may not be the best solution to use, depending on the greater context of what you are trying to do. Is it safe to just ignore the files with no extensions?

In particular, you should generally try to sanitize incoming data as much as possible before processing it, though this is a relatively trivial example and its likely that sanitizing the data for this would be just as expensive as doing this particular split. Put another way, user input being dirty isn't really an "exceptional" condition.

this is definitely the right answer if you want to use a try-except — C. Fennell, Oct 11 '19 at 22:43
Thank you, this works a treat, and I should have realized how simple this, upset wit himself that it was so obvious, I knew how to do this but for some reason didn't try..., still learning to be confident when trying things. — Jake Rankin, Oct 11 '19 at 23:51

How to account for unexpected data when trying to split values

2 Answers2