subprocess stdout readline() encoding error in "find" command in linux

Question

I have referred the other questions regarding the same error. But I would not want to specify an encoding, and just want to skip to the next line. Is it possible to ignore errors in readline() and read next?

I am using find utility to get files older than 30 days. and it returns the files with full path. But when a different user used the code for another path, he got the encoding error. So if there is an error in stdout.readline() then I want to skip the line and move to next. Does stdout.readline() allow something like skip on errors?

Also in this given scenario of find result, Can I use utf-8 encoding and be sure the paths will be read without errors?

find_cmd = ['find', '/a/b', '-mtime', f'+30', '-readable', '-type', 'f', '-print']
j = ' '.join(find_cmd)
proc = subprocess.Popen(j, universal_newlines=True, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

while True:
  file = proc.stdout.readline().replace('\n', '') #Error here 'utf-8' codec can't decode byte 0xe4 in position 1478: invalid continuation byte
  if not file: break
  movefile(file)

tripleee · Accepted Answer · 2020-11-30T17:11:27.473

0

If the output from find is not guaranteed to be UTF-8, don't use universal_newlines=True (aka text=True from Python 3.7 on).

You can selectively decode as you read, and skip the entries which are not valid UTF-8 if that's what you want.

Also, for the love of $dmr, don't join back together your perfectly good list only so that you can waste an unnecessary shell=True on it.

Finally, don't redirect stderr to stdout if you don't want to have error messages from find presented as if they were file names. Simply don't redirect stderr at all to have them displayed on the console, or direct stderr to subprocess.DEVNULL if you want to discard them completely.

find_cmd = [
    'find', '/a/b', '-mtime', f'+30', '-readable',
    '-type', 'f', '-print']
proc = subprocess.Popen(find_cmd, stdout=subprocess.PIPE, check=True)

while True:
  filename = proc.stdout.readline().replace(b'\n', b'')
  if not filename:
    break
  try:
    file = filename.decode('utf-8')
    movefile(file)
  except UnicodeDecodeError:
    logging.info('Skipping non-UTF8 filename %r' % filename)

You'll notice I added check=True to subprocess.Popen(); if you want to ignore find failures, maybe take that out again.

edited Nov 30 '20 at 17:11

answered Nov 30 '20 at 17:05

tripleee

175,061
34
275
318

If your file system is not UTF-8 clean, I suspect something is wrong, or at least sinister. – tripleee Nov 30 '20 at 17:13
i had some error with using list (i think with escaping characters in the arguments) so i had done `join` to make it work. There should have been another right approach for the fix im sure. – sjd Dec 03 '20 at 10:22
Could the error have been in `readline()` or in `replace()`? . as i got the error in the same line and here also the try block is after the error line.. – sjd Dec 03 '20 at 10:26
Using a list _removes_ the need to quote anything, because the quoting is to prevent the shell from messing with strings, and of course when you don't have `shell=True` that won't happen, and then any quotes you put there anyway will be errors. But yes, it can be challenging to figure out if you don't know how the shell's quoting mechanisms work. – tripleee Dec 03 '20 at 10:26
1

The error was that Python would try to `.decode()` for you implicitly when you say `text=True`. We now do the `decode` explicitly later, and catch any errors from only that operation. Maybe see also the section about `text=True` in https://stackoverflow.com/questions/4256107/running-bash-commands-in-python/51950538#51950538 – tripleee Dec 03 '20 at 10:28
It fixed my issue !!. can I ask why the byte conversion in `replace`?. I use a `filename.startswith(strvar)` which gives error `startswith first arg must be bytes or a tuple of bytes, not str` . I think `filename.startswith(bytes(strvariable))` should fix it – sjd Dec 03 '20 at 16:06
1

Do that after you decode. It just made sense to strip the newline immediately. So then you have `file.startswith` instead of `filename.startswith` (though I suppose you could use the same variable name for the decoded value). – tripleee Dec 03 '20 at 16:26
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225485/discussion-between-sjd-and-tripleee). – sjd Dec 03 '20 at 17:12

score -1 · Answer 2 · edited Nov 30 '20 at 16:52

-1

In

find_cmd = ['find', '/a/b', '-mtime', f'+30', '-readable', '-type', 'f', '-print']

to - (redirect errors to /dev/null)

find_cmd = ['find', '/a/b', '-mtime', f'+30', '-readable', '-type', 'f', '-print','&> /dev/null']

errors should not come

edited Nov 30 '20 at 16:52

sjd

1,329
4
28
48

answered Nov 29 '20 at 17:50

paint_program

1

Hi @paint_program- I think the encoding error is caused by special characters in the output (in my case the file paths) while reading the output. `/dev/null` will skip the errors it finds while trying to execute the command(`find`). right? – sjd Nov 30 '20 at 09:23
well that solution was for skipping system error.. You can use proc.stdout.readline().decode('utf-8') .. – paint_program Nov 30 '20 at 11:17

subprocess stdout readline() encoding error in "find" command in linux

2 Answers2