2

In the Python Data Science Handbook the following example is given (the penultimate line is the one which I don't understand, as indicated):

import pandas as pd
import numpy as np
import seaborn as sns
sns.set()

#Downloaded from: https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/births.csv

births = pd.read_csv('births.csv')
births['decades'] = (births['year'] // 10) * 10

# Rhobust sigma clipping operation - ignore this
quartiles = np.percentile(births['births'], [25, 50, 75])
mu = quartiles[1]
sig = 0.74 * (quartiles[2] - quartiles[0])
births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')

births['day'] = births['day'].astype(int)

births.index = pd.to_datetime(10000 * births.year +
                             100 * births.month +
                             births.day, format='%Y%m%d')

births_by_date = births.pivot_table('births', [births.index.month, births.index.day])

#Help on the loop below
births_by_date.index = [pd.datetime(2012, month, day)
                       for (month, day) in births_by_date.index]

print(births_by_date.index)

I don't understand the construction of the births_by_date.index in the for loop. I understand that the loop is getting applied to the pivot table, but I've never seen what looks like the output array put before the loop.

Can someone explain how this works, or direct me to an appropriate explanation please?

I have tried: How do I save results of a "for" loop into a single variable?

numberous tutorials such as this one: https://www.learnpython.org/en/Loops

various other questions, but I can't find anything similar.

Preston
  • 7,399
  • 8
  • 54
  • 84
  • 2
    It's a _list comprehension_ if that's what you're unsure about. – roganjosh Oct 23 '17 at 14:11
  • @roganjosh thanks for that, if you want to put an answer up ill close this off and get googling. – Preston Oct 23 '17 at 14:14
  • Is that enough of a basis to answer the question for you? It's the syntax in `[pd.datetime(2012, month, day) for (month, day) in births_by_date.index]` that confused you? I don't want to answer if it sets you on a random Google trail. – roganjosh Oct 23 '17 at 14:18
  • 1
    @roganjosh yea it's just the syntax, and the fact that i didn't really know what to google, so I was struggling to find an explanation. Having googled list comprehension I'm definitely on the right track, thanks for the help – Preston Oct 23 '17 at 14:23

2 Answers2

2

It's called a "list comprehension" which you can read about here among other sources. The comprehension is evaluated and then assigned back to the index of the dataframe, basically to give a year to your dates. It's equivalent to:

some_list = []
for month, day in births_by_date.index:
    some_list.append(pd.datetime(2012, month, day))

births_by_date.index = some_list
roganjosh
  • 12,594
  • 4
  • 29
  • 46
1

It's a list comprehension as already mentioned. It's a concise syntax for running a loop on a list and generating another list by transforming it.

A simple example to double the elements of a list:

items = [1, 2, 3, 4]
doubled_items = [2*item for item in items]
# doubled_items is [2, 4, 6, 8]

This is essentially the same as:

items = [1, 2, 3, 4]
doubled_items = []

for item in items:
    doubled_items.append(2*item)
Amjad
  • 53
  • 6