2

I am doing a business case on retrieving stock information. The teacher uses the code below to create DataFrames with stock information.

#The tech stocks we'll use for this analysis

tech_list = ['AAPL','GOOG','MSFT','AMZN']

#Set up End and Start times for data grab
end = datetime.now()

start = datetime(end.year - 1,end.month,end.day)

#For loop for grabing yahoo finance data and setting as a dataframe

for stock in tech_list: 
# Set DataFrame as the Stock Ticker
     globals()[stock] = DataReader(stock,'yahoo',start,end)

He uses globals() to create the 4 dataframes with the techstock. I read in the question below that you can also use dictionary to achieve the same goal.

pandas set names of dataframes in loop

MY QUESTION is that i do not understand this line of code in the answer:

frames = {i:dat for i, dat in data.groupby('Sport')}

Can someone explain?

Barmar
  • 741,623
  • 53
  • 500
  • 612
Antonio
  • 35
  • 3
  • `globals()` is a dictionary – Z4-tier Dec 17 '20 at 17:32
  • `frames` is a dictionary that is being built using a *comprehension*. The call `data.groupby()` is returning a pair of values, called `i` and `dat`, and the notation`{i:dat for i, dat in ...}` is building a new dictionary out of all such pairs, using `i` as the key and `dat` as the value. – Z4-tier Dec 17 '20 at 17:35
  • 1
    You could also write it as `frames = dict(data.groupby('Sport'))` – Barmar Dec 17 '20 at 17:36
  • @Z4-tier There's probably a canonical dup that explains what a dictionary comprehension is, but I don't have it in my list. You got one? – Barmar Dec 17 '20 at 17:39
  • @Barmar I can't find one. Lots of questions that involve dictionary comprehensions, but I can't find something that *just* explains what it is. I'll post an answer and maybe someone else will come by with the link. – Z4-tier Dec 17 '20 at 17:53
  • @Barmar I just use the list comprehension canonical one, the accepted answer explains them – juanpa.arrivillaga Dec 17 '20 at 18:18

1 Answers1

1

In this case, frames is a dictionary that is being built using a dictionary comprehension. The call data.groupby() is returning a pair of values, which are being called i and dat in the comprehension, and the notation {i:dat for i, dat in ...} is building a new dictionary out of all such pairs, using i as the key and dat as the value. The result is stored in frames.

The general syntax is (for the case where the iterator returns 2 elements):

{key: value for key, value in iterator}

The answers to this question do a good job explaining what an iterator is in python. Usually (but not always), when used in a dictionary comprehension, the iterator's __next__() method will return two elements. At least one of the elements must be hashable so that it can be used as the dictionary key.

iterator doesn't necessarily need to return two elements (although that is a common use pattern). This works:

print(dict([(i, chr(65+i)) for i in range(4)]))
{0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'}

and also shows that dictionary comprehensions are really just special syntax using same mechanics as list comprehensions and the dict() method, which is what the comment by @Barmar is doing:

frames = dict(data.groupby('Sport'))

In this case, data.groupby() does need to return two elements, and the order does matter, as it is shorthand for (roughly) this:

dict([(key, value) for key, value in data.groupby('Sport')])
Z4-tier
  • 7,287
  • 3
  • 26
  • 42
  • Thanks for the reply! Now i understand the logic behind a dictionary comprehension. Just one last doubt. The dataframe 'frame' in the example has 2 columns ('Sport' & 'random_data'). What if there would be another column 'player', so, the dataframe 'frame' has 3 columns. If i wanted to create a dictioanry comprehnesion to call only the columns 'Sport' and 'player', what would be the syntax in a dict comprehnesion? I tried the following but it does not work: 'frames = {i:dat for i in data.groupby('Sport'), dat in data['player'] }' – Antonio Dec 19 '20 at 13:11
  • You could do something like this: `{sport: {player: player_stats for player, player_stats in sport_stats.groupby('player')} for sport, sport_stats in data.groupby('sport')}`. This uses a nested dictionary comprehension: since `sport_stats.groupby('sport')` returns pairs that look like `('sport', DataFrame)`, the inner comprehension will unpack the first DataFrame (which I called sport_stats), this time grouping by player, so you get `(sport, (player, player_DataFrame))`. This is a good example of why it is good to use meaningful names and not single letters :) – Z4-tier Dec 19 '20 at 15:59