Sorry, @colicab, I suppose you know a lot of these things I'm mentioning. It is just I started to write an answer to you and ended up with a more general answer inspired by your challenge. This is probably more a personal answer than most in SO but I suppose there will be a lot of things that most people will agree to be pythonic.
What is pythonic is always debatable for some extend. Let us see some tools from Python, usually considered pythonic (i.e., elegant and useful in a Python script), that can help us.
I'd say the most pythonic thing to do is to create functions generously. For example, you will have a string with numbers separated by commas, and will need to turn it into a list of int
s. We should create a function to generate the list:
def parse_numbers(numbers_string):
pass
# Little assert to ensure it works
assert parse_numbers('1, 2, 3') == [1, 2, 3]
Ok, our function does nothing for now, but we can solve it using...
We can easily get a list of strings between the commas using the str.split()
method:
>>> "1, 2, 3".split(',')
['1', ' 2', ' 3']
This (alone) does not solve our problem, however. Firstly, because the strings in the list has whitespaces. One can solve it by using the str.strip()
method:
>>> ' 1 '.strip()
'1'
We are closer to a list of int
s, for sure, but not there yet. After all, '1'
is a string containing a digit, not an integer number (and in Python those are very different things). Of course, we can easily solve it by using the int()
constructor:
>>> int('1')
1
>>> int(' 1 '.strip())
1
Now, how could we apply this operation to all strings on the list we got below?
Many times you will want to create lists from other lists. In other places, we use to create an empty list and fill it with stuff. The pythonic way, however, involves list comprehensions. For example, the line below will take each element resulting from the split, strip all the whitespaces from it and the stripped result will be converted to an int
. After all, the result of these operations will be put in a new list:
>>> [int(n.strip()) for n in '1, 2 , 3 '.split(',')]
[1, 2, 3]
Applying the same logic for the numbers_string
parameter, we can get the beautiful function below:
def parse_numbers(numbers_string):
return [int(n.strip()) for n in numbers_string.split(',')]
# Little assert to ensure it works
assert parse_numbers('1, 2, 3') == [1, 2, 3]
Simple, cohesive, clear - indeed, pythonic.
Now, what we do? We should get the language name and the list of numbers. To do it, we get back to the first answer: a function! But to make it work, we will use the very pythonic...
Sequence packing and unpacking (documentation)
Our next function will take a string like 'English : 1, 2
' and return a pair suitable to be used in a dict construtor:
def parse_language(language_string):
language, numbers_string = language_string.split(':')
return language.strip(), parse_numbers(numbers_string)
# Little assert to ensure it works
assert parse_language('English : 1, 2, 3') == ('English', [1, 2, 3])
We already know the strip()
and the split()
. The new magic is that the split(':')
call will return a list with two values - and we can put them in two variables by one assignment:
language, numbers_string = language_string.split(':')
This is called unpacking and is very pythonic. Also, note that the return
command is followed by two values separated by a comma. The result will be a value of the tuple
type containing the two values:
>>> parse_language('English : 1, 2, 3')
('English', [1, 2, 3])
The process by which two values become an only tuple is called packing.
More functions
Only if we had only language per string... However, we have various languages per string, like in 'English : 1, 2, 3 ; Dutch : 5, 6, 7'
. But we know the solution for it, right? Yes, a new function! Now, using everything we learned: split()
, strip()
, list comprehensions...
def split_languages(languages):
return [language.strip() for language in languages.split(';')]
# Little assert to ensure it works
assert (
split_languages('English : 1, 2; Dutch : 5, 7') ==
['English : 1, 2', 'Dutch : 5, 7']
)
Of course, we just get only a list of strings, not a dict. It is easy to solve using the very pythonic...
As you may now, dicts can be either created by the {key: value ...}
syntax or through the dict()
constructor. The constructor has some very cool behaviors. One of them is to create dicts from a list of pairs. Consider the list below:
>>> l = [('key-1', 0), ('key-2', 'value'), ('key-3', 2)]
If we pass it to a dict constructor, we will get a dict like the one below:
>>> dict(l)
{'key-3': 2, 'key-2': 'value', 'key-1': 0}
That's why parse_language()
returns a tuple: we can use it to create key-pair values. Using it with generator expressions (a kind of fancier and more efficient list comprehension) and the dict constructor, we can get all languages from a string this way:
def parse_languages(languages):
return dict(
parse_language(language)
for language in split_languages(languages)
)
# You know, let's assure everything is until now
assert (
parse_languages('English : 1, 2; Dutch : 5, 7') ==
{
'English' : [1, 2],
'Dutch' : [5, 7]
}
)
Since each, let us say, "category" has a name (such as "person" or "home") and a "language string" parseable by parse_languages()
, our next pythonic step is to use...
Another function
This new function will not have great news, actually: unpacking will be enough to save the day:
def parse_category(category_tuple):
category, languages = category_tuple
return category, parse_languages(languages)
# It is pythonic to test your functions, too! As you can see, asserts
# can help on it. They are not very popular, however... Go figure.
assert (
parse_category( ('person', 'English : 1, 2; Dutch : 5, 7') ) ==
(
'person',
{
'English' : [1, 2],
'Dutch' : [5, 7]
}
)
)
Note that our parse_category()
function returns a tuple. This is because we can use generator expressions plus the dict constructor to create a dict from all tuples of your input. With that, we can yield a very elegant, pythonic function:
def parse_tuples(tuples):
return dict(parse_category(category) for category in tuples)
# No assert now. I have typed too much, I need some coffee :(
But here would we put these lot of functions? In one of the most pythonic things ever:
In my case, I saved it all in a file named langparse.py
. Now I can import it and call our parse_tuples()
function:
>>> import langparse
>>> tuplist = [('person', u'English : 1, 2, 3 ; Dutch : 5, 6, 7'), ('home', u'English : 8, 9, 10; Dutch: 11, 12, 13')]
>>> langparse.parse_tuples(tuplist)
{'person': {u'Dutch': [5, 6, 7], u'English': [1, 2, 3]}, 'home': {u'Dutch': [11, 12, 13], u'English': [8, 9, 10]}}
Calling it in the terminal is of course just for test, but the sky is the limit. Since it is in a module, I can use all my functions elsewhere. Modules are so pythonic that the last line line in the Zen of Python is a homage to them:
Namespaces are one honking great idea -- let's do more of those!
Of course, not only for them, but modules are surely one of the most important namespaces in Python.
"Well, well," I can hear you wondering, "this is all cool and good but I just want to write a little (yet somewhat complicated) script! What should I do now, write another file just to call my module?" Not at all, my friend! You just need to use...
The __name__ == "__main__"
idiom
In python, you can retrieve the name of a module from the __name__
variable. It will always be the name of the module... with one exception: if the module was called directly (instead of imported), then the value of __name__
will be "__main__"
. For example, if I create such a simple module:
$ echo 'print(__name__)' > mymod.py
...and import it, the output will be its name:
$ python -c 'import mymod'
mymod
However, if I execute mymod.py
directly, the output is __main__
:
$ python mymod.py
__main__
So, we can add the following lines at the end of our module. Now it will always execute the code if called directly, but never will not execute if the module is imported:
if __name__ == "__main__":
tuplist = [
('person', u'English : 1, 2, 3 ; Dutch : 5, 6, 7'),
('home', u'English : 8, 9, 10; Dutch: 11, 12, 13')
]
print parse_tuples(tuplist)
Let us see? Here is the result in my machine:
$ python langparse.py
{'person': {u'Dutch': [5, 6, 7], u'English': [1, 2, 3]}, 'home': {u'Dutch': [11, 12, 13], u'English': [8, 9, 10]}}
Conclusion
Python has a lot of cool things that makes programming an awesome task. Those includes useful methods, modules, lists, tuples, dicts, list comprehensions (as well as generator expressions and dict generators!), unpacking... Learning all these idioms will make everything easier. However, if you have to use one and only one thing, please use functions and modules. Even if you create too much of them, it is easier to merge them than to split.
And, after all, if you have doubts about what to do, just remember our supreme guide, the Zen of Python. Yes, this is kinda humorous, but it is for real, too.
The full script can be seen in Pastebin.