I was just going through this page here and found this entry :
print sum(ord(c) for c in 'Happy new year to you!')
It is python code and on execution it prints 2014. Could someone help a Java developer understand exactly what's going on here?
I was just going through this page here and found this entry :
print sum(ord(c) for c in 'Happy new year to you!')
It is python code and on execution it prints 2014. Could someone help a Java developer understand exactly what's going on here?
A few things to understand:
Strings are iterable by default, so one can simply iterate over each element in a string:
for c in 'Hello there':
print c
ord
is a built-in function that returns the actual numerical code point for a character.
The expression ord(c) for c in 'Happy new year to you!'
is a generator expression. The result of this returns a generator function back, which retrieves the results of the total generator expression upon subsequent calls to __next__()
. That happens both under the covers to us and is done in a lazy fashion; if the __next__()
piece isn't invoked, then you don't generate the next value. This is useful if the expression you want to generate contains a lot of values.
This is actually the crux of the snippet of code; it's expressing something that would have to be written more clumsily in Java in a more terse way.
sum
takes a list as an argument and returns the total numerical value of its contents.int s = 0;
for (char c: "Happy new year to you!".toCharArray())
s += (int) c;
System.out.println(s);
ord()
converts a character to its ASCII value. sum()
adds up a collection of objects for which the addition operation is defined, mathematical scalar addition in this case.
The expression inside the sum()
is a generator expression, a type of iterable statement that doesn't have a clean equivalent in Java, but is similar to LINQ in .NET. Essentially, it is an inline for-each loop, looping over each character in the string "Happy new year to you!", calculating the ASCII value of the character with ord
, and summing these numerical values.
1) Built-in function ord returns integer value of char.
>>> help(ord)
Help on built-in function ord in module __builtin__:
ord(...)
ord(c) -> integer
Return the integer ordinal of a one-character string.
2) for loops does the iteration on each char of the string 'Happy new year to you!'
>>> for c in 'Happy new year to you':
... print ord(c)
...
72
97
112
112
...
3) (ord(c) for c in 'Happy new year to you!')
is a generator expression in python.
>>> result = (ord(c) for c in 'Happy new year to you!')
>>> result.next()
72
>>> result.next()
97
4) sum built-in function returns total of integer value of each char:
>>> help(sum)
Help on built-in function sum in module __builtin__:
sum(...)
sum(sequence[, start]) -> value
Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.
So the result of combining all these expression is:
>>> sum(ord(c) for c in 'Happy new year to you!')
2014
Another possible solution could be:
>>> sum(map(lambda c:ord(c), 'Happy new year to you!'))
2014
print
is a statement (in Python 2.x) that will print the expression that follows it.
(Note that in Python 3.x, print()
is a function that prints its arguments.)
The expression is an call to a built-in function sum()
. Whatever it is summing, the result is 2014, so print
prints 2014
.
sum()
is being passed a special construct called a "generator expression". This is similar to a "list comprehension" but a bit more efficient.[1] The basic format of a generator expression is:
expression for variable in iterable
Here, variable is c
. The iterable is a string, 'Happy new year to you!'
The expression is a call to the built-in function ord()
that returns an integer representing the character it is passed; for example, ord('A')
returns 65
.
So, this sums the ordinal values of all the characters in the string; the sum is 2014 and that is printed.
[1] A list comprehension builds a list of values. A generator expression doesn't build anything, but can be repeatedly called to yield up one value at a time. Functions in Python that accept iterables are able to accept a generator expression and get the values from it.
You could write this with a generator expression to build a list, then sum the list. But if you did that, the list would be constructed, looked at once, then garbage-collected. Why waste the effort to allocate and destroy the list object, when all you want is to sum the values? Thus, the generator expression.
An expression of the form found in this code snippet and surrounded by "naked" ( )
is called a generator comprehension. It produces a specific kind of iterable known as a generator in Python.
There are other kinds of comprehensions as well. The expression surrounded by naked brackets would be a list comprehensions. Example:
[char for char in "string"]
This will produce a list:
['s','t','r','i','n','g']
And "naked" braces (aka a set comprehension) produce a set:
{char for char in "string"}
This makes the set:
{'s','t','r','i','n','g'}
(There are also dictionary comprehensions.)
As I said at the first, using just the parentheses around this kind of statement of the form something for something in something_else
produces a special kind of iterator called a generator in Python (rather than a list or a set, like the above examples).
However, in Python, lots of other things are iterable, including strings. Inside of the generator, each character is retrieved as the string is iterated over, one at a time as it is called in turn, s
, t
,... Etc. The retrieved character is then the object referred to by char
for that iteration.
The ord(char)
part applies the ord
function to each char
in turn as the string is iterated over. The ord
function simply finds the unicode number for the particular character that has been retrieved from the string. That unicode value is then the result of the overall generator for the current iteration.
To get the values out of a generator, you must iterate over it in some way - such as using next()
, or a for
...in
statement. But usually you can also apply a generator as an argument to any function that receives an iterable for an argument. In this case, sum()
(which is obviously meant to add a series of successive arguments together) is being applied to all of the results of the generator. Each yielded result of the generator is a member of the series.
So overall effect of the code is to add together all the unicode values of the string characters. The overall result of 2014 just seems to be a coincidence. Nothing mysterious or magical going on there.