4

I am trying to convert a simple pandas dataframe into a nested JSON file based on the answer I found here: pandas groupby to nested json

My grouped dataframe looks like this:

                  firstname lastname  orgname         phone        mobile  email
teamname members                                                           
1        0            John      Doe     Anon  916-555-1234          none   john.doe@wildlife.net 
         1            Jane      Doe     Anon  916-555-4321  916-555-7890   jane.doe@wildlife.net
2        0          Mickey    Moose  Moosers  916-555-0000  916-555-1111   mickey.moose@wildlife.net
         1           Minny    Moose  Moosers  916-555-2222          none   minny.moose@wildlife.net  

My code is:

data = pandas.read_excel(inputExcel, sheetname = 'Sheet1', encoding = 'utf8')
grouped = data.groupby(['teamname', 'members']).first()

results = defaultdict(lambda: defaultdict(dict))

for index, value in grouped.itertuples():
    for i, key in enumerate(index):
        if i ==0:
            nested = results[key]
        elif i == len(index) -1:
            nested[key] = value
        else:
            nested = nested[key]

print json.dumps(results, indent = 4)

I get the following error on the first "for" loop. What causes this error in this circumstance and what would it take to fix it to output the nested json?

    for index, value in grouped.itertuples():
ValueError: too many values to unpack
Community
  • 1
  • 1
spaine
  • 443
  • 1
  • 5
  • 17

2 Answers2

2

When using itertuples(), the index is included as part of the tuple, so the for index, value in grouped.itertuples(): doesn't really make sense. In fact, itertuples() uses namedtuple with Index being one of the names.

Consider the following setup:

data = {'A': list('aabbc'), 'B': [0, 1, 0, 1, 0], 'C': list('vwxyz'), 'D': range(5,10)}
df = pd.DataFrame(data).set_index(['A', 'B'])

Yielding the following DataFrame:

     C  D
A B      
a 0  v  5
  1  w  6
b 0  x  7
  1  y  8
c 0  z  9

Then printing each tuple in df.itertuples() yields:

Pandas(Index=('a', 0), C='v', D=5)
Pandas(Index=('a', 1), C='w', D=6)
Pandas(Index=('b', 0), C='x', D=7)
Pandas(Index=('b', 1), C='y', D=8)
Pandas(Index=('c', 0), C='z', D=9)

So, what you'll probably want to do is something like the code below, with value being replaced by t[1:]:

for t in grouped.itertuples():
    for i, key in enumerate(t.Index):
        ...

If you want to access components of the namedtuple, you can access things positionally, or by name. So, in the case of your DataFrame, t[1] and t.firstname should be equivalent. Just remember that t[0] is the index, so your first column starts at 1.

root
  • 32,715
  • 6
  • 74
  • 87
  • If I run those 2 lines, I get: for i, key in enumerate(t.index): TypeError: 'builtin_function_or_method' object is not iterable – spaine Jun 14 '16 at 19:10
  • 1
    Index should be capitalized: `enumerate(t.Index)`. – root Jun 14 '16 at 19:27
  • That gives me a different error: for i, key in enumerate(t.Index): AttributeError: 'tuple' object has no attribute 'Index' – spaine Jun 14 '16 at 19:36
  • 1
    What version of `pandas` are you using (see `pd.__version__`)? If it's an older version, `itertuples` may not be implemented as `namedtuple` yet, so trying to access it by name might not work. Trying calling it positionally: `enumerate(t[0])`. If that doesn't work, try looking at a few of the tuples that are generated by `itertuples` to see what it's actually producing for you, and adjust your code accordingly. – root Jun 14 '16 at 19:47
  • I was using pandas 0.16.1 so I upgraded to 0.18.1 and it seems to have worked (after I updated references to "index" to "t.Index" and "values" to "t". Your help got me to 99% of what I need. Thanks! – spaine Jun 14 '16 at 21:04
  • 1
    This is my final code `data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8') grouped = data.groupby(['teamname', 'members']).first() print grouped results = defaultdict(lambda: defaultdict(dict)) for t in grouped.itertuples(): for i, key in enumerate(t.Index): if i ==0: nested = results[key] elif i == len(t.Index) -1: nested[key] = t else: nested = nested[key] jsonOutput = json.dumps(results, indent = 4)` – spaine Jun 14 '16 at 21:15
  • Would you be willing to chat to explain how this bit of code works? – spaine Jun 15 '16 at 17:35
0

As I understand itertuples, it will return a tuple with the first value being the index and the remaining values being all of the columns. You only have for index, value in grouped.itertuples() which means it's trying to unpack all of the columns into a single variable, which won't work. The groupby probably comes into play as well but it should still contain all the values within the result which means you still have too many columns being unpacked.

bravosierra99
  • 1,331
  • 11
  • 23