5

Hi I have a pandas df similar to below

information         record
name                apple
size                {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}
country             America
partiesrelated      [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]

and I want to transform the df to another df like this

information                  record
name                         apple
size_weight_gram             300
size_weight_oz               10.5
size_description_height      10
size_description_width       15 
country                      America
partiesrelated_nameOfFarmer  John Smith
partiesrelated_farmerID      A0001

In this case, the dictionary will be parse into single lines where size_weight_gram and contain the value.

the code for df

df = pd.DataFrame({'information': ['name', 'size', 'country', 'partiesrealated'], 
                   'record': ['apple', {'weight':{'gram':300,'oz':10.5},'description':{'height':10,'width':15}}, 'America', [{'nameOfFarmer':'John Smith'},{'farmerID':'A0001'}]]})
df = df.set_index('information')
rafaelc
  • 57,686
  • 15
  • 58
  • 82
Platalea Minor
  • 877
  • 2
  • 9
  • 22
  • Possible duplicate of [How to iterate over rows in a DataFrame in Pandas?](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – Joost Aug 01 '18 at 03:30
  • 2
    Nope, @Joost it a different question. Your post is to ask about iterating over rows. Mine is parsing new rows based on the original single row. new column names is given based on the dictionary keys and value – Platalea Minor Aug 01 '18 at 03:38
  • You are right, but your initial phrasing was vague enough that it didnt focus enough on your particular problem that it looked like a much simpler question. Also, it would be appreciated if you show what you have already tried. Anyway you edited your answer sucht that it is much clearer now what you are asking. – Joost Aug 01 '18 at 05:42

1 Answers1

2

IIUC, you can define a recursive function to unnest your sequences/dicts until you have a list of key, value that may both serve as a valid input for pd.DataFrame constructor and be formatted as the way you described.

Take a look at this solution:

import itertools
import collections

ch = lambda ite: list(itertools.chain.from_iterable(ite))

def isseq(obj):
    if isinstance(obj, str): return False
    return isinstance(obj, collections.abc.Sequence)

def unnest(k, v):
    if isseq(v): return ch([unnest(k, v_) for v_ in v])
    if isinstance(v, dict): return ch([unnest("_".join([k, k_]), v_) for k_, v_ in v.items()])
    return k,v

def pairwise(i):
    _a = iter(i)
    return list(zip(_a, _a))

a = ch([(unnest(k, v)) for k, v in zip(d['information'], d['record'])])
pd.DataFrame(pairwise(a))

    0                                 1
0   name                              apple
1   size_weight_gram                  300
2   size_weight_oz                    10.5
3   size_description_height           10
4   size_description_width            15
5   country                           America
6   partiesrealated_nameOfFarmer      John Smith
7   partiesrealated_farmerID          A0001

Due to the recursive nature of the solution, the algorithm would unnest up to any depth you might have. For example:

d={
  'information': [
    'row1',
    'row2',
    'row3',
    'row4'
  ],
  'record': [
    'val1',
    {
      'val2': {
        'a': 300,
        'b': [
          {
            "b1": 10.5
          },
          {
            "b2": 2
          }
        ]
      },
      'val3': {
        'a': 10,
        'b': 15
      }
    },
    'val4',
    [
      {
        'val5': [
          {
            'a': {
              'c': [
                {
                  'd': {
                    'e': [
                      {
                        'f': 1
                      },
                      {
                        'g': 3
                      }
                    ]
                  }
                }
              ]
            }
          }
        ]
      },
      {
        'b': 'bar'
      }
    ]
  ]
}



    0                    1
0   row1                 val1
1   row2_val2_a          300
2   row2_val2_b_b1       10.5
3   row2_val2_b_b2       2
4   row2_val3_a          10
5   row2_val3_b          15
6   row3                 val4
7   row4_val5_a_c_d_e_f  1
8   row4_val5_a_c_d_e_g  3
9   row4_b               bar
rafaelc
  • 57,686
  • 15
  • 58
  • 82