I am trying to print the last line of every file in a directory using shell command from python script

Question

I am storing the number of files in a directory in a variable and storing their names in an array. I'm unable to store file names in the array. Here is the piece of code I have written.

import os
temp = os.system('ls -l /home/demo/ | wc -l')

no_of_files = temp - 1

command = "ls -l /home/demo/ | awk 'NR>1 {print $9}'"

file_list=[os.system(command)]

for i in range(len(file_list))
    os.system('tail -1 file_list[i]')

tripleee · Accepted Answer · 2018-09-16T11:58:28.517

Your shell scripting is orders of magnitude too complex.

output = subprocess.check_output('tail -qn1 *', shell=True)

or if you really prefer,

os.system('tail -qn1 *')

which however does not capture the output in a Python variable.

If you have a recent-enough Python, you'll want to use subprocess.run() instead. You can also easily let Python do the enumeration of the files to avoid the pesky shell=True:

output = subprocess.check_output(['tail', '-qn1'] + os.listdir('.'))

As noted above, if you genuinely just want the output to be printed to the screen and not be available to Python, you can of course use os.system() instead, though subprocess is recommended even in the os.system() documentation because it is much more versatile and more efficient to boot (if used correctly). If you really insist on running one tail process per file (perhaps because your tail doesn't support the -q option?) you can do that too, of course:

for filename in os.listdir('.'):
    os.system("tail -n 1 '%s'" % filename)

This will still work incorrectly if you have a file name which contains a single quote. There are workarounds, but avoiding a shell is vastly preferred (so back to subprocess without shell=True and the problem of correctly coping with escaping shell metacharacters disappears because there is no shell to escape metacharacters from).

for filename in os.listdir('.'):
    print(subprocess.check_output(['tail', '-n1', filename]))

Finally, tail doesn't particularly do anything which cannot easily be done by Python itself.

for filename in os.listdir('.'):
    with open (filename, 'r') as handle:
        for line in handle:
            pass
        # print the last one only
        print(line.rstrip('\r\n'))

If you have knowledge of the expected line lengths and the files are big, maybe seek to somewhere near the end of the file, though obviously you need to know how far from the end to seek in order to be able to read all of the last line in each of the files.

No, it's not complex, you just need to understand how it's simpler. In particular, only running `tail` once is vastly more efficient than running one instance per file. But I have updated this with another example and some more explanations. — tripleee, Apr 05 '18 at 04:23

AbdealiLoKo · Answer 2 · 2018-04-05T03:19:18.240

1

os.system returns the exitcode of the command and not the output. Try using subprocess.check_output with shell=True

Example:

>>> a = subprocess.check_output("ls -l /home/demo/ | awk 'NR>1 {print $9}'", shell=True)
>>> a.decode("utf-8").split("\n")

Edit (as suggested by @tripleee) you probably don't want to do this as it will get crazy. Python has great functions for things like this. For example:

>>> import glob
>>> names = glob.glob("/home/demo/*")

will directly give you a list of files and folders inside that folder. Once you have this, you can just do len(names) to get the first command.

Another option is:

>>> import os
>>> os.listdir("/home/demo")

Here, glob will give you the whole filepath /home/demo/file.txt and os.listdir will just give you the filename file.txt

The ls -l /home/demo/ | wc -l command is also not the correct value as ls -l will show you "total X" on top mentioning how many total files it found and other info.

edited Apr 05 '18 at 03:19

answered Apr 05 '18 at 03:02

AbdealiLoKo

3,261
2
20
36

1

[Don't try to parse `ls` output](https://mywiki.wooledge.org/ParsingLs), you'll only get hurt, and piss off the pig. – tripleee Apr 05 '18 at 03:11
`os.listdir` helped a lot but when I'm using that array in for loop I'm unable to index it as the variable cannot be passed in shell command in subprocess – Prashant Luhar Apr 05 '18 at 03:30
1

I don't follow that train of thought. Why can't you loop over a python list? You'd probably make a python for loop and run a sub process in every iteration – AbdealiLoKo Apr 05 '18 at 04:40

l'L'l · Answer 3 · 2018-04-05T04:53:10.957

1

You could likely use a loop without much issue:

files = [f for f in os.listdir('.') if os.path.isfile(f)]

for f in files:
    with open(f, 'rb') as fh:
        last = fh.readlines()[-1].decode()
        print('file: {0}\n{1}\n'.format(f, last))
    fh.close()

Output:

file.txt
Hello, World!

...

If your files are large then readlines() probably isn't the best option. Maybe go with tail instead:

for f in files:
    print('file: {0}'.format(f))
    subprocess.check_call(['tail', '-n', '1', f])
    print('\n')

The decode is optional, although for text "utf-8" usually works or if it's a combination of binary/text/etc then maybe something such as "iso-8859-1" usually should work.

edited Apr 05 '18 at 04:53

answered Apr 05 '18 at 04:01

l'L'l

44,951
10
95
146

+1 especially for the insight to skip directory entries which aren't files and for the simple native Python solution. Doing this entirely in Python is slightly hairy but much more efficient. – tripleee Apr 05 '18 at 04:25
Though supposing in 2018 that somebody's default encoding is still `iso-8859-1` is laugh/cry material. I can see why you don't want to assume the files are Unicode but if they actually are, that assumption would simplify the code considerably. – tripleee Apr 05 '18 at 04:26
... Though `tail` probably does some useful things behind the scenes to avoid reading the entire file if it's big, which could turn out to be more efficient in the end. – tripleee Apr 05 '18 at 04:29
@tripleee: Agreed on all points, I put the reference to `ISO-8859-1` in there because some binary files puke on this type of thing. Also, I updated with a tail example — since it would obviously be quicker than reading the whole file. Thanks for the comments as always, cheers! – l'L'l Apr 05 '18 at 04:44
You'll want to avoid the `shell=True`, it just complicates matters here. `check_call(['tail', '-n', '1', f])` – tripleee Apr 05 '18 at 04:47
I guess you will still want to print the result from `check_call` – tripleee Apr 05 '18 at 04:51
@tripleee: It automatically prints for me, so I was confused about that at first, does it exhibit the same behavior for you? – l'L'l Apr 05 '18 at 04:52
1

Oh nvm, I am just so used to having `stdout=subprocess.PIPE` everywhere. – tripleee Apr 05 '18 at 04:53

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

you are not able to store file names because os.system does not return output as you expect it to be. For more information see : this.
From the docs

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

os.system executes linux shell commands as it is. for getting output for these shell commands you have to use python subprocess

Note : In your case you can get file names using either glob module or os.listdir(): see How to list all files of a directory

I am trying to print the last line of every file in a directory using shell command from python script

4 Answers4