69

I want to run the Linux word count utility wc to determine the number of lines currently in the /var/log/syslog, so that I can detect that it's growing. I've tried various test, and while I get the results back from wc, it includes both the line count as well as the command (e.g., var/log/syslog).

So it's returning: 1338 /var/log/syslog But I only want the line count, so I want to strip off the /var/log/syslog portion, and just keep 1338.

I have tried converting it to string from bytestring, and then stripping the result, but no joy. Same story for converting to string and stripping, decoding, etc - all fail to produce the output I'm looking for.

These are some examples of what I get, with 1338 lines in syslog:

  • b'1338 /var/log/syslog\n'
  • 1338 /var/log/syslog

Here's some test code I've written to try and crack this nut, but no solution:

import subprocess

#check_output returns byte string
stdoutdata = subprocess.check_output("wc --lines /var/log/syslog", shell=True)
print("2A stdoutdata: " + str(stdoutdata))
stdoutdata = stdoutdata.decode("utf-8")
print("2B stdoutdata: " + str(stdoutdata))    
stdoutdata=stdoutdata.strip()
print("2C stdoutdata: " + str(stdoutdata))    

The output from this is:

  • 2A stdoutdata: b'1338 /var/log/syslog\n'

  • 2B stdoutdata: 1338 /var/log/syslog

  • 2C stdoutdata: 1338 /var/log/syslog

  • 2D stdoutdata: 1338 /var/log/syslog

Roshin Raphel
  • 2,612
  • 4
  • 22
  • 40
user2565677
  • 875
  • 1
  • 8
  • 15
  • Probably see also [actual meaning of `shell=True`](https://stackoverflow.com/questions/3172470/actual-meaning-of-shell-true-in-subprocess) for the many reasons to avoid `shell=True` when you can, such as in your case. – tripleee Feb 15 '21 at 05:29

5 Answers5

83

I suggest that you use subprocess.getoutput() as it does exactly what you want—run a command in a shell and get its string output (as opposed to byte string output). Then you can split on whitespace and grab the first element from the returned list of strings.

Try this:

import subprocess
stdoutdata = subprocess.getoutput("wc --lines /var/log/syslog")
print("stdoutdata: " + stdoutdata.split()[0])
Joseph Dunn
  • 1,298
  • 9
  • 9
  • 16
    You should be warned that the `subprocess.getoutput` belongs to the category of *Legacy Shell Invocation Functions* (http://docs.python.org/3/library/subprocess.html#subprocess.getoutput). – pepr Aug 16 '13 at 13:22
  • @pepr But what does the 'legacy' designation mean, practically speaking? I don't see a timeline for removal, as of 3.5.0a0 . (May be defined elsewhere?) – belacqua Jun 12 '14 at 21:28
  • 2
    @belacqua: As the paraghraph just below *17.5.6. Legacy Shell Invocation Functions* says (https://docs.python.org/3.5/library/subprocess.html#legacy-shell-invocation-functions) -- cite (the emphasis added): *These operations **implicitly** invoke the **system shell** and **none of the guarantees** described above regarding **security and exception handling consistency** are valid for these functions.* – pepr Jun 13 '14 at 07:13
  • 3
    @belacqua: The `subprocess.check_function()` (https://docs.python.org/3.5/library/subprocess.html#subprocess.check_output) is better replacement and also requries less work. See the J.F.Sebastian's http://stackoverflow.com/a/18270852/1346705. The argument also can be a string. – pepr Jun 13 '14 at 07:20
  • 3
    @pepr I believe you meant to say `check_output`, not check_function..? – Greg Sadetsky Jan 29 '18 at 02:07
  • 1
    Yes, @GregSadetsky. My fault. ;) – pepr Jan 29 '18 at 20:18
  • 1
    no worries!! you might want to edit your comment so that others can see the right function..? :-) cheers – Greg Sadetsky Jan 30 '18 at 00:06
35

Since Python 3.6 you can make check_output() return a str instead of bytes by giving it an encoding parameter:

check_output('wc --lines /var/log/syslog', encoding='UTF-8')

But since you just want the count, and both split() and int() are usable with bytes, you don't need to bother with the encoding:

linecount = int(check_output('wc -l /var/log/syslog').split()[0])

While some things might be easier with an external program (e.g., counting log line entries printed by journalctl), in this particular case you don't need to use an external program. The simplest Python-only solution is:

with open('/var/log/syslog', 'rt') as f:
    linecount = len(f.readlines())

This does have the disadvantage that it reads the entire file into memory; if it's a huge file instead initialize linecount = 0 before you open the file and use a for line in f: linecount += 1 loop instead of readlines() to have only a small part of the file in memory as you count.

cjs
  • 25,752
  • 9
  • 89
  • 101
10

To avoid invoking a shell and decoding filenames that might be an arbitrary byte sequence (except '\0') on *nix, you could pass the file as stdin:

import subprocess

with open(b'/var/log/syslog', 'rb') as file:
    nlines = int(subprocess.check_output(['wc', '-l'], stdin=file))
print(nlines)

Or you could ignore any decoding errors:

import subprocess

stdoutdata = subprocess.check_output(['wc', '-l', '/var/log/syslog'])
nlines = int(stdoutdata.decode('ascii', 'ignore').partition(' ')[0])
print(nlines)
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Is there any way to get the `sys.stdout.encoding` in this case so we pass this to decode instead of `ascii` ? What if we `subprocess.PIPE` stdout ? – Mr_and_Mrs_D Jun 24 '17 at 22:03
  • 1
    @Mr_and_Mrs_D it would be a wrong thing to do: 1- it won't help in the general case (a filename may be a byte sequence that is undecodable by any character encoding as it is said explicitly in the answer. See PEP 383) 2- ascii works here (to decode digits printed by wc in any locale supported by Python) – jfs Jun 24 '17 at 22:22
4

Equivalent to Curt J. Sampson's answer is also this one (it's returning a string):

subprocess.check_output('wc -l /path/to/your/file | cut -d " " -f1', universal_newlines=True, shell=True)

from docs:

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.

Something similar, but a bit more complex using subprocess.run():

subprocess.run(command, shell=True, check=True, universal_newlines=True, stdout=subprocess.PIPE).stdout

as subprocess.check_output() could be equivalent to subprocess.run().

Catalin B.
  • 41
  • 2
  • also, on python 3.7 there's capture_output=True https://docs.python.org/3/library/subprocess.html – fersarr Mar 21 '19 at 14:39
1

getoutput (and the closer replacement getstatusoutput) are not a direct replacement of check_output - there are security changes in 3.x that prevent some previous commands from working that way (my script was attempting to work with iptables and failing with the new commands). Better to adapt to the new python3 output and add the argument universal_newlines=True:

check_output(command, universal_newlines=True)

This command will behave as you expect check_output, but return string output instead of bytes. It's a direct replacement.

tk421storm
  • 333
  • 1
  • 2
  • 10