0

Quick summary of my problem: ghGby is a dictionary of keys each corresponding to a dataframe groupby. I would like to divide each groupby into their own groupby's corresponding to each set of days in the column of the groupby.

groupDays = {}
groupDayIndexer = []
for x in ghGby.keys():
    for y in ghGby[x].DAY.unique().keys():
        if x in groupDays:
            groupDays[x].append(ghGby[x].get_group(y))
        else:
            groupDays[x] = ghGby[x].get_group(y)

Data looks like this:

ghGby[keyA]:

day|transaction|etc.
51 | ......... | ...
51 | ......... | ...
63 | ......... | ...
63 | ......... | ...
63 | ......... | ...
94 | ......... | ...

.get_group(y) returns each set of days as an individual object just fine, but when I append them to groupDays I only get one day of the groupby rather than each one like this:

print(groupDays['keyA'])
{keyA: [day51GroupBy, day63GroupBy, day94GroupBy]}

more background information:

original dataset looks like this, just many thousands of household_keys. My objective is to be able to access a subset of this large dataset by specifying my desired day and desired household key. As these are transactions, the same key can have multiple entries on the same day.

household_key   DAY  PRODUCT_ID
1929            4    1004906
1929            4    1004906
1929            95   1004906
1929            202  1004906
1929            207  1004906    

my desired output:

print(groupDays['household_key1929'])
{[ghGby[groupDays['household_key1929'].get_group(day4), ghGby[groupDays['household_key1929'].get_group(day52), ghGby[groupDays['household_key1929'].get_group(day95), ghGby[groupDays['household_key1929'].get_group(day202)]} 

I would like to do this so that I can access my data easier, like this:

display(groupDays['household_key1929'][0])

household_key   DAY  PRODUCT_ID
1929            4    1004906
1929            4    1004906

I am accessing the first element of the list of days associated to household key 1929, in this case it would be day 4.

Bigboss01
  • 438
  • 6
  • 21
  • please, could you please show us the original dataset? and the desired output as well. Thanks! – Samir Hinojosa Jun 09 '21 at 08:40
  • @SamirHinojosa hello Samir- just updated with some more information, hope it makes things clearer – Bigboss01 Jun 09 '21 at 09:03
  • Could you please to show us the desired output as a dataset? Thanks! – Samir Hinojosa Jun 09 '21 at 09:08
  • @SamirHinojosa no problem, just added. – Bigboss01 Jun 09 '21 at 09:20
  • I don't know what is the data in dict's values... but I guess you need to work with [`explode()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html), you can see a [`related answer`](https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe) and after that make a `groupby()` based on yours needs – Samir Hinojosa Jun 09 '21 at 09:28

1 Answers1

0

simple nested dictionary:

dic2make = {}
for indexer in dataList:
    dic2make[indexer] = {}
    for element in groupbyDict[indexer].DAY.unique().keys():
        dic2make[indexer][element] = groupbyDict[indexer].get_group(element)

Very simple solution to a simple question. Am beginner therefore my description of the problem was very confusing. Anyways- this is what I was looking for. If none of this makes sense to you but you think you need to understand what's happening, feel free to leave a comment.

Bigboss01
  • 438
  • 6
  • 21