1
data_df = pandas.read_csv('details.csv')
data_df = data_df.replace('Null', np.nan)
df = data_df.groupby(['country', 'branch']).count()
df = df.drop('sales', axis=1)  
df = df.reset_index()
print df

I would like to convert result of a Data Frame(df) to user defined json format that i mentioned below. After printing the result(df) i will get the result in the form

country     branch      no_of_employee     total_salary    count_DOB   count_email
  x            a            30                 2500000        20            25
  x            b            20                 350000         15            20
  y            c            30                 4500000        30            30
  z            d            40                 5500000        40            40
  z            e            10                 1000000        10            10
  z            f            15                 1500000        15            15

i would like to convert this to Json.My desired format is

x
   {
      a
        {
              no.of employees:30
              total salary:2500000
              count_email:25
         }
       b
         {
              no.of employees:20
              total salary:350000
              count_email:25

           }
     }

   y
     {

        c
         {
              no.of employees:30
              total salary:4500000
              count_email:30

           }
      }
   z
     {
       d
         {
              no.of employees:40
              total salary:550000
              count_email:40
         }
       e
         {
              no.of employees:10
              total salary:100000
              count_email:15

         }
        f
         {
              no.of employees:15
              total salary:1500000
              count_email:15

         }
    }

Please notice that i don't want all the fields in the data Frame Result in Json(eg:count_DOB)

Edwin Baby
  • 279
  • 1
  • 4
  • 11

1 Answers1

2

You can use groupby with apply to_dict and last to_json:

  country branch  no_of_employee  total_salary  count_DOB  count_email
0       x      a              30       2500000         20           25
1       x      b              20        350000         15           20
2       y      c              30       4500000         30           30
3       z      d              40       5500000         40           40
4       z      e              10       1000000         10           10
5       z      f              15       1500000         15           15

g = df.groupby('country')[["branch", "no_of_employee","total_salary", "count_email"]]
                              .apply(lambda x: x.set_index('branch').to_dict(orient='index'))
print g.to_json()
{
    "x": {
        "a": {
            "total_salary": 2500000,
            "no_of_employee": 30,
            "count_email": 25
        },
        "b": {
            "total_salary": 350000,
            "no_of_employee": 20,
            "count_email": 20
        }
    },
    "y": {
        "c": {
            "total_salary": 4500000,
            "no_of_employee": 30,
            "count_email": 30
        }
    },
    "z": {
        "e": {
            "total_salary": 1000000,
            "no_of_employee": 10,
            "count_email": 10
        },
        "d": {
            "total_salary": 5500000,
            "no_of_employee": 40,
            "count_email": 40
        },
        "f": {
            "total_salary": 1500000,
            "no_of_employee": 15,
            "count_email": 15
        }
    }
}

I try print g.to_dict(), but JSON was invalid (check it here).

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 4
    while running this code i got TypeError: to_dict() got an unexpected keyword argument 'orient' – Edwin Baby Feb 18 '16 at 07:40
  • It doesnt work with this sample of with your real data? – jezrael Feb 18 '16 at 07:42
  • Please chcek it with this `df` - `df = pd.DataFrame({'count_email': {0: 25, 1: 20, 2: 30, 3: 40, 4: 10, 5: 15}, 'country': {0: 'x', 1: 'x', 2: 'y', 3: 'z', 4: 'z', 5: 'z'}, 'count_DOB': {0: 20, 1: 15, 2: 30, 3: 40, 4: 10, 5: 15}, 'branch': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f'}, 'total_salary': {0: 2500000, 1: 350000, 2: 4500000, 3: 5500000, 4: 1000000, 5: 1500000}, 'no_of_employee': {0: 30, 1: 20, 2: 30, 3: 40, 4: 10, 5: 15}}) `. Still same error? If yes, what version of `pandas` use - `print pd.__version__`? – jezrael Feb 18 '16 at 07:53
  • now everything works perfectly . what i have done is un install the pandas and install another version.now ,everything works perfectly – Edwin Baby Feb 18 '16 at 09:24
  • thank you very much. my current pandas version is 0.17.1. I tried with json Lint and the JSON is valid – Edwin Baby Feb 18 '16 at 09:26