48

In spirit of the existing "what's your most useful C/C++ snippet" - thread:

Do you guys have short, monofunctional Python snippets that you use (often) and would like to share with the StackOverlow Community? Please keep the entries small (under 25 lines maybe?) and give only one example per post.

I'll start of with a short snippet i use from time to time to count sloc (source lines of code) in python projects:

# prints recursive count of lines of python source code from current directory
# includes an ignore_list. also prints total sloc

import os
cur_path = os.getcwd()
ignore_set = set(["__init__.py", "count_sourcelines.py"])

loclist = []

for pydir, _, pyfiles in os.walk(cur_path):
    for pyfile in pyfiles:
        if pyfile.endswith(".py") and pyfile not in ignore_set:
            totalpath = os.path.join(pydir, pyfile)
            loclist.append( ( len(open(totalpath, "r").read().splitlines()),
                               totalpath.split(cur_path)[1]) )

for linenumbercount, filename in loclist: 
    print "%05d lines in %s" % (linenumbercount, filename)

print "\nTotal: %s lines (%s)" %(sum([x[0] for x in loclist]), cur_path)
Community
  • 1
  • 1
ChristopheD
  • 112,638
  • 29
  • 165
  • 179
  • 5
    The Python Cookbook (http://code.activestate.com/recipes/langs/python/) is a much better resource for this. Examples, commentary, comments, and available online and in book form. Also, your example is a maintenance horror and "%05d" % ln is better than "%s" % (str(len).zfill(5)). – Andrew Dalke Mar 28 '09 at 02:00
  • 2
    Examples of "horror":1) m.split(curpath)[1] fails if cur_path is "/home/dalke" and m is "/home/dalke/subdir/home/dalke/whatever". 2) the list() isn't needed. 3) 'for b,zn in [(r,f) for ...]' can be reduced to 'for b,ignore,zn in os.walk(cur_path). Oh, and 4) newlines and indentation help readability – Andrew Dalke Mar 28 '09 at 02:08
  • why not use .endswith() for checking the .py extension? – daniel Mar 28 '09 at 02:52
  • also, suggest using a set for the ignore list. this isn't a performance sensitive app, but no reason not to take advantage of hashes for lookups. – daniel Mar 28 '09 at 02:53

22 Answers22

37

I like using any and a generator:

if any(pred(x.item) for x in sequence):
    ...

instead of code written like this:

found = False
for x in sequence:
    if pred(x.n):
        found = True
if found:
    ...

I first learned of this technique from a Peter Norvig article.

Jacob Gabrielson
  • 34,800
  • 15
  • 46
  • 64
23

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
22

The only 'trick' I know that really wowed me when I learned it is enumerate. It allows you to have access to the indexes of the elements within a for loop.

>>> l = ['a','b','c','d','e','f']
>>> for (index,value) in enumerate(l):
...     print index, value
... 
0 a
1 b
2 c
3 d
4 e
5 f
theycallmemorty
  • 12,515
  • 14
  • 51
  • 71
18

zip(*iterable) transposes an iterable.

>>> a=[[1,2,3],[4,5,6]]
>>> zip(*a)
    [(1, 4), (2, 5), (3, 6)]

It's also useful with dicts.

>>> d={"a":1,"b":2,"c":3}
>>> zip(*d.iteritems())
[('a', 'c', 'b'), (1, 3, 2)]
AKX
  • 152,115
  • 15
  • 115
  • 172
  • I loved this when I first found it, I think of it as "unzipping", lol. But I didn't know about dictionaries though. Thanks. – jacktrader Apr 25 '19 at 16:05
16

Fire up a simple web server for files in the current directory:

python -m SimpleHTTPServer

Useful for sharing files.

Adam Lehenbauer
  • 309
  • 4
  • 12
14

A "progress bar" that looks like:

|#############################---------------------|
59 percent done

Code:

class ProgressBar():
    def __init__(self, width=50):
        self.pointer = 0
        self.width = width

    def __call__(self,x):
         # x in percent
         self.pointer = int(self.width*(x/100.0))
         return "|" + "#"*self.pointer + "-"*(self.width-self.pointer)+\
                "|\n %d percent done" % int(x) 

Test function (for windows system, change "clear" into "CLS"):

if __name__ == '__main__':
    import time, os
    pb = ProgressBar()
    for i in range(101):
        os.system('clear')
        print pb(i)
        time.sleep(0.1)
Theodor
  • 5,536
  • 15
  • 41
  • 55
11

To flatten a list of lists, such as

[['a', 'b'], ['c'], ['d', 'e', 'f']]

into

['a', 'b', 'c', 'd', 'e', 'f']

use

[inner
    for outer in the_list
        for inner in outer]
George V. Reilly
  • 15,885
  • 7
  • 43
  • 38
  • 4
    Or `sum(the_list, [])`. Although I suspect this is going to go very wrong somewhere (aside from generators, of course). – HoverHell Mar 10 '12 at 12:48
  • @HoverHell, some people may argue with it being perhaps "proper" but I've used this method for a little while and love it. Best! – jacktrader Apr 25 '19 at 16:07
10

Huge speedup for nested list and dictionaries with:

deepcopy = lambda x: cPickle.loads(cPickle.dumps(x))
vartec
  • 131,205
  • 36
  • 218
  • 244
  • 1
    I've always been leery of this technique, although it seems like it should work about as fast as anything else I could think of. do pythonistas consider this a good way to get deep copies? (fwiw, i use this technique anyway) – SingleNegationElimination Jul 12 '09 at 15:37
8

Suppose you have a list of items, and you want a dictionary with these items as the keys. Use fromkeys:

>>> items = ['a', 'b', 'c', 'd']
>>> idict = dict().fromkeys(items, 0)
>>> idict
{'a': 0, 'c': 0, 'b': 0, 'd': 0}
>>>

The second argument of fromkeys is the value to be granted to all the newly created keys.

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • 1
    fromkeys is a static method. You should do "dict.fromkeys(items, 0)". Your code creates and throws away an empty dictionary. – Andrew Dalke Mar 28 '09 at 19:39
  • 1
    @Andrew Dalke , I believe `dict.fromkeys` is a class-method. reason: `dict.fromkeys` returns a dictionary back, hence it _should_ get `class` as its first argument. Think about when you've subclassed `dict` -- `MyDict.fromkeys` should give an instance of `MyDict` – Jeffrey Jose Mar 14 '10 at 18:41
  • @jeffjose: You are correct. I did the test you suggested and looked at the code since I was curious how that was done. – Andrew Dalke Mar 17 '10 at 14:53
7

To find out if line is empty (i.e. either size 0 or contains only whitespace), use the string method strip in a condition, as follows:

if not line.strip():    # if line is empty
    continue            # skip it
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
5

I like this one to zip everything up in a directory. Hotkey it for instabackups!

import zipfile

z = zipfile.ZipFile('my-archive.zip', 'w', zipfile.ZIP_DEFLATED)
startdir = "/home/johnf"
for dirpath, dirnames, filenames in os.walk(startdir):
  for filename in filenames:
    z.write(os.path.join(dirpath, filename))
z.close()
John Feminella
  • 303,634
  • 46
  • 339
  • 357
5

For list comprehensions that need current, next:

[fun(curr,next) 
 for curr,next 
 in zip(list,list[1:].append(None)) 
 if condition(curr,next)] 

For circular list zip(list,list[1:].append(list[0])).

For previous, current: zip([None].extend(list[:-1]),list) circular: zip([list[-1]].extend(list[:-1]),list)

vartec
  • 131,205
  • 36
  • 218
  • 244
  • A (slight adjustment of) the `pairwise` recipe does the same, and works for all iterables: http://docs.python.org/3.0/library/itertools.html#recipes – Stephan202 Jul 12 '09 at 13:51
4

Hardlink identical files in current directory (on unix, this means they have share physical storage, meaning much less space):

import os
import hashlib

dupes = {}

for path, dirs, files in os.walk(os.getcwd()):
    for file in files:
        filename = os.path.join(path, file)
        hash = hashlib.sha1(open(filename).read()).hexdigest()
        if hash in dupes:
            print 'linking "%s" -> "%s"' % (dupes[hash], filename)
            os.rename(filename, filename + '.bak')
            try:
                os.link(dupes[hash], filename)
                os.unlink(filename + '.bak')
            except:
                os.rename(filename + '.bak', filename)
            finally:
        else:
            dupes[hash] = filename
rmmh
  • 6,997
  • 26
  • 37
  • 3
    +1, though this code can be improved upon, by using a unique temporary filename, instead of blindly assuming `filename.bak` doesn't exists. – Stephan202 Jul 12 '09 at 13:47
3

Here are few which I think are worth knowing but might not be useful on an everyday basis. Most of them are one liners.

Removing Duplicates from a List

L = list(set(L))

Getting Integers from a string (space seperated)

ints = [int(x) for x in S.split()]

Finding Factorial

fac=lambda(n):reduce(int.__mul__,range(1,n+1),1)

Finding greatest common divisor

>>> def gcd(a,b):
...     while(b):a,b=b,a%b
...     return a
jack_carver
  • 1,510
  • 2
  • 13
  • 28
  • How can you be sure that set(L) doesn't mess with the order of the original list? sets are orderless – GabiMe May 03 '11 at 12:54
  • 3
    Yes, it probably does, but how WOULD you remove duplicates from a list without messing with the order? That's not a very well-defined question. – weronika Sep 07 '11 at 06:18
  • To maintain sequence, you might need to do it in a program (sorry about the formatting, but this is a comment): new_list=[]; for x in old_list: if x not in new_list: new_list.append(x) – RufusVS Sep 02 '18 at 03:02
2

Emulating a switch statement. For example switch(x) {..}:

def a():
  print "a"

def b():
  print "b"

def default():
   print "default"

apply({1:a, 2:b}.get(x, default))
GabiMe
  • 18,105
  • 28
  • 76
  • 113
2
  • like another person above, I said 'Wooww !!' when I discovered enumerate()

  • I sang a praise to Python when I discovered repr() that gave me possibility to see precisely the content of strings that I wanted to analyse with a regex

  • I was very satisfied to discover that print '\n'.join(list_of_strings) is displayed much more rapidly with '\n'.join(...) than for ch in list_of_strings: print ch

  • splitlines(1) with an argument keeps the newlines

These four "tricks" combined in one snippet very useful to rapidly display the code source of a web page , line after line, each line being numbered , all the special characters like '\t' or newlines being not interpreted, and with the newlines present:

import urllib
from time import clock,sleep

sock = urllib.urlopen('http://docs.python.org/')
ch = sock.read()
sock.close()


te = clock()
for i,line in enumerate(ch.splitlines(1)):
    print str(i) + ' ' + repr(line)
t1 = clock() - te


print "\n\nIn 3 seconds, I will print the same content, using '\\n'.join(....)\n" 

sleep(3)

te = clock()
# here's the point of interest:
print '\n'.join(str(i) + ' ' + repr(line)
                for i,line in enumerate(ch.splitlines(1)) )
t2 = clock() - te

print '\n'
print 'first  display took',t1,'seconds'
print 'second display took',t2,'seconds'

With my not very fast computer, I got:

first  display took 4.94626048841 seconds
second display took 0.109297410704 seconds
eyquem
  • 26,771
  • 7
  • 38
  • 46
2
import tempfile
import cPickle

class DiskFifo:
    """A disk based FIFO which can be iterated, appended and extended in an interleaved way"""
    def __init__(self):
        self.fd = tempfile.TemporaryFile()
        self.wpos = 0
        self.rpos = 0
        self.pickler = cPickle.Pickler(self.fd)
        self.unpickler = cPickle.Unpickler(self.fd)
        self.size = 0

    def __len__(self):
        return self.size

    def extend(self, sequence):
        map(self.append, sequence)

    def append(self, x):
        self.fd.seek(self.wpos)
        self.pickler.clear_memo()
        self.pickler.dump(x)
        self.wpos = self.fd.tell()
        self.size = self.size + 1

    def next(self):
        try:
            self.fd.seek(self.rpos)
            x = self.unpickler.load()
            self.rpos = self.fd.tell()
            return x

        except EOFError:
            raise StopIteration

    def __iter__(self):
        self.rpos = 0
        return self
piotr
  • 5,657
  • 1
  • 35
  • 60
1

A custom list that when multiplied by other list returns a cartesian product... the good thing is that the cartesian product is indexable, not like that of itertools.product (but the multiplicands must be sequences, not iterators).

import operator

class mylist(list):
    def __getitem__(self, args):
        if type(args) is tuple:
            return [list.__getitem__(self, i) for i in args]
        else:
            return list.__getitem__(self, args)
    def __mul__(self, args):
        seqattrs = ("__getitem__", "__iter__", "__len__")
        if all(hasattr(args, i) for i in seqattrs):
            return cartesian_product(self, args)
        else:
            return list.__mul__(self, args)
    def __imul__(self, args):
        return __mul__(self, args)
    def __rmul__(self, args):
        return __mul__(args, self)
    def __pow__(self, n):
        return cartesian_product(*((self,)*n))
    def __rpow__(self, n):
        return cartesian_product(*((self,)*n))

class cartesian_product:
    def __init__(self, *args):
        self.elements = args
    def __len__(self):
        return reduce(operator.mul, map(len, self.elements))
    def __getitem__(self, n):
        return [e[i] for e, i  in zip(self.elements,self.get_indices(n))]
    def get_indices(self, n):
        sizes = map(len, self.elements)
        tmp = [0]*len(sizes)
        i = -1
        for w in reversed(sizes):
            tmp[i] = n % w
            n /= w
            i -= 1
        return tmp
    def __add__(self, arg):
        return mylist(map(None, self)+mylist(map(None, arg)))
    def __imul__(self, args):
        return mylist(self)*mylist(args)
    def __rmul__(self, args):
        return mylist(args)*mylist(self)
    def __mul__(self, args):
        if isinstance(args, cartesian_product):
            return cartesian_product(*(self.elements+args.elements))
        else:
            return cartesian_product(*(self.elements+(args,)))
    def __iter__(self):
        for i in xrange(len(self)):
            yield self[i]
    def __str__(self):
        return "[" + ",".join(str(i) for i in self) +"]"
    def __repr__(self):
        return "*".join(map(repr, self.elements))
fortran
  • 74,053
  • 25
  • 135
  • 175
1

Iterate over any iterable (list, set, file, stream, strings, whatever), of ANY size (including unknown size), by chunks of x elements:

from itertools import chain, islice

def chunks(iterable, size, format=iter):
    it = iter(iterable)
    while True:
        yield format(chain((it.next(),), islice(it, size - 1)))

>>> l = ["a", "b", "c", "d", "e", "f", "g"]
>>> for chunk in chunks(l, 3, tuple):
...         print chunk
...     
("a", "b", "c")
("d", "e", "f")
("g",)
Bite code
  • 578,959
  • 113
  • 301
  • 329
1

For Python 2.4+ or earlier:

for x,y in someIterator:
  listDict.setdefault(x,[]).append(y)

In Python 2.5+ there is alternative using defaultdict.

vartec
  • 131,205
  • 36
  • 218
  • 244
1

I actually just created this, but I think it's going to be a very useful debugging tool.

def dirValues(instance, all=False):
    retVal = {}
    for prop in dir(instance):
        if not all and prop[1] == "_":
            continue
        retVal[prop] = getattr(instance, prop)
    return retVal

I usually use dir() in a pdb context, but I think this will be much more useful:

(pdb) from pprint import pprint as pp
(pdb) from myUtils import dirValues
(pdb) pp(dirValues(someInstance))
Josh Russo
  • 3,080
  • 2
  • 41
  • 62
0

When debugging, you sometimes want to see a string with a basic editor. For showing a string with notepad:

import os, tempfile, subprocess

def get_rand_filename(dir_=os.getcwd()):
    "Function returns a non-existent random filename."
    return tempfile.mkstemp('.tmp', '', dir_)[1]

def open_with_notepad(s):
    "Function gets a string and shows it on notepad"
    with open(get_rand_filename(), 'w') as f:
        f.write(s)
        subprocess.Popen(['notepad', f.name])
iTayb
  • 12,373
  • 24
  • 81
  • 135