2

I am new to python. I am trying to execute a bash script in python to extract the count of different file extensions. I tried the following command

import subprocess
output = subprocess.check_output("sudo find . -type f -name '*.*' -exec sh -c 'echo ${0##*.}' {} \; | sort | uniq -c | sort -nr | awk '{print $2 ":" $1}'", shell=True)

But it throws a syntax error. On executing find command in bash shell

sudo find . -type f -name '*.*' -exec sh -c 'echo ${0##*.}' {} \; | sort | uniq -c | sort -nr | awk '{print $2 ":" $1}'

output will be as follows

png:3156
json:333
c:282
svg:241
zsh:233
js:192
gz:169
zsh-theme:143
ttf:107
cache:103
md:93

So how can i get the same output in python code? what is the correction required in my current approach? Thanks in advance

Sjn73
  • 189
  • 2
  • 8
  • 2
    use triple quotes on the outside like `"""sudo..."""`? – Chris_Rands Oct 06 '17 at 12:36
  • 1
    The `awk` part should be something like `awk '{print \"{\" $2 \":\" $1 \"}\"}'`. It seems like you got unescaped double quotes inside double quotes. – Abdou Oct 06 '17 at 12:36
  • File "", line 1 direct_output = subprocess.check_output("sudo find . -type f -name '*.*' -exec sh -c 'echo ${0##*.}' {} \; | sort | uniq -c | sort -nr | awk '{print "{" $2 ":" $1 "}"}'", shell=True) ^ SyntaxError: invalid syntax – Sjn73 Oct 06 '17 at 12:38
  • does the error still occurs without "sudo" ? – Guillaume.P Oct 06 '17 at 12:49
  • @GuillaumePaniagua: yes no difference by removing 'sudo' – Sjn73 Oct 06 '17 at 12:50
  • Possible duplicate of [running bash commands in python](https://stackoverflow.com/questions/4256107/running-bash-commands-in-python) – kenorb Oct 06 '17 at 12:52
  • Try with: `output = subprocess.check_output(['bash','-c', bashCommand])`. – kenorb Oct 06 '17 at 12:53
  • @Chris_Rands: Many thanks!! – Sjn73 Oct 06 '17 at 13:20
  • If you have a named, say, `foo.*`, `echo ${0##*.}` is going to print a list of files in the current directory -- and the behavior of a file named `foo.-n` is entirely undefined. `printf '%s\n' "${0##*.}"` is much reliable -- adding quotes to suppress string-splitting and globbing, and using `printf` rather than `echo`. – Charles Duffy Oct 06 '17 at 13:41
  • @CharlesDuffy: thanks for the correction!! – Sjn73 Oct 06 '17 at 16:43

2 Answers2

4

As mentioned in the comments any double quote in a string quoted with double quotes needs to be escaped with a backslash:

import subprocess
output = subprocess.check_output("sudo find . -type f -name '*.*' -exec sh -c 'echo ${0##*.}' {} \; | sort | uniq -c | sort -nr | awk '{print $2 \":\" $1}'", shell=True)

Single quotes inside a double quoted string do not have any special meaning (except directly at the beginning), so that doesn't allow you to avoid escaping.

The fine details are explained under the header String and Bytes literals from the Python language reference.

As mentioned in the comments another option, which is probably easier to read, is to use triple double quotes:

import subprocess
output = subprocess.check_output("""sudo find . -type f -name '*.*' -exec sh -c 'echo ${0##*.}' {} \; | sort | uniq -c | sort -nr | awk '{print $2 ":" $1}'""", shell=True)

While this answers the question, for ease of reading and maintainability I suggest to replace it instead completely with Python, as suggested in another answer.

Jan Zerebecki
  • 825
  • 7
  • 18
3

By the way, you could try to do the same thing in pure Python. Here is a minimal code that does it:

import os

def count_all_ext ( path ):
    res = {}
    for root,dirs,files in os.walk( path ):
        for f in files :
            if '.' in f :
                e = f.rsplit('.',1)[1]
                res[e] = res.setdefault(e,0)+1
    return res.items()


print '\n'.join( '%s:%d'%i for i in count_all_ext('.'))

OK, it's very long compared to the Bash snippet, but it's Python...

tom
  • 21,844
  • 6
  • 43
  • 36
Captain'Flam
  • 479
  • 4
  • 12
  • Thanks @Captain'Falm ..... Was expecting bash script to run in python!! – Sjn73 Oct 06 '17 at 13:13
  • Yep, sorry... it's strong than me, I always try to script in python (instead of platform dependant shell) – Captain'Flam Oct 06 '17 at 13:32
  • 1
    I'd certainly consider this a better approach than the amalgam of tools glued together badly. (Clarifying "badly" -- properly used, `awk` could do the job of `sort` *and* `uniq` *and* the shells running the parameter expansion, etc; using 5 independent pieces instead of one is a sign of not knowing how to use the one well). – Charles Duffy Oct 06 '17 at 13:36
  • @CharlesDuffy Thanks for helping in deciding proper approach !! – Sjn73 Oct 07 '17 at 01:14
  • @Captain'Flam: since bash asking super user privilege python approach is working fine!! As i am new to python i was unable to understand the line: `print '\n'.join( '%s:%d'%i for i in count_all_ext('.'))` please explain join method here!,Thanks in advance – Sjn73 Oct 10 '17 at 17:48
  • Since it's not a good place for teaching python, I suggest you to read more python tutorials, especiallyand consider these samples that may enlighten you : `print '-'.join( i for i in ['spam','and','eggs'])` ; `print '+'.join( '(%d)'%i for i in (1,2,3,4,5))` ; `a_tuple = (1,'spam',3,'eggs')` ; `print '%d %s and %d %s'%a_tuple` ; `a_tuple_of_tuples = (('spam',3),('and',8),('eggs',5))` ; `print '*'.join( '%s(%d)'%i for i in a_tuple_of_tuples )` – Captain'Flam Oct 11 '17 at 09:00
  • Sorry for the previous comment : definitely not a good place for code sample... But, in python tutorials : [read especially about `str.join`](https://www.tutorialspoint.com/python/string_join.htm) and the [string formating operator `%`](https://www.tutorialspoint.com/python/python_strings.htm) (try the **try it** button). – Captain'Flam Oct 11 '17 at 09:11