0

I am new to python and pandas have a csv file with

.----.---------.-------.-------------------.-------------------.-------------------.-------------------.
| id | country | state | cold_stress_score | cold_stress_level | heat_stress_score | heat_stress_level |
:----+---------+-------+-------------------+-------------------+-------------------+-------------------:
|  1 | USA     | NJ    |             0.003 | low               |             0.673 | moderate          |
:----+---------+-------+-------------------+-------------------+-------------------+-------------------:
|  2 | USA     | NJ    |             0.001 | high              |               0.2 | high              |
:----+---------+-------+-------------------+-------------------+-------------------+-------------------:
|  3 | USA     | NJ    |             0.004 | moderate          |               0.3 | low               |
:----+---------+-------+-------------------+-------------------+-------------------+-------------------:
|  4 | USA     | NY    |             0.005 | moderate          |               0.4 | moderate          |
:----+---------+-------+-------------------+-------------------+-------------------+-------------------:
|  5 | USA     | NY    |             0.006 | high              |               0.5 | high              |
:----+---------+-------+-------------------+-------------------+-------------------+-------------------:
|  6 | USA     | NY    |             0.009 | low               |               0.6 | low               |
'----'---------'-------'-------------------'-------------------'-------------------'-------------------'

and i wanted to convert this into nested way of json

expected json

  {
  "id":1,
  "country": "USA",
  "state": "NJ",
  "cold_stress":{
    "cold_stress_score" : 0.003,
    "cold_stress_level": "low",
  },
  "heat_stress":{
    "heat_stress_score" : 0.0673,
    "heat_stress_level": "moderate",

  }

}

I tried this solution Convert Pandas Dataframe to nested JSON

j = (df.groupby(['id','country','state'], as_index=False)
             .apply(lambda x: x[['cold_stress_score','cold_stress_level']].to_dict('r'))
             .reset_index()
             .rename(columns={0:'cold_stress'})
             .to_json(orient='records'))

I wanted to add heat Stress to json the above code returning

  "id":1,
  "country": "USA",
  "state": "NJ",
  "cold_stress":{
    "cold_stress_score" : 0.003,
    "cold_stress_level": "low",
  }
}

how can I able to add heat_stress my csv is too big and am looking for dynamic value populating in above like cold stress

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252

2 Answers2

1

If you do little or no processing like here, pandas is both overkill and too complex. My advice is to stick to the csv and json modules from the standard library.

Code could be (more or less):

with open(inputfile) as fdin, open (outputfile, "w") as fdout:
    rd = csv.DictReader(fdin)
    js = [{'id': int(row['id']), 'country': row['country'], 'state': row['state'],
           'cold_stress': {'cold_stress_code': row['cold_stress_code'],
                           'cold_stress_level': row['cold_stress_level']},  
           'heat_stress': {'heat_stress_code': row['heat_stress_code'],
                           'heat_stress_level': row['heat_stress_level']}, 
           } for row in rd]
    json.dump(js, fdout, indent=2)
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • i have to do some filters over data like select based on id or state or country , i thought pandas would be best to do more operations and filtering – Subba Taniparti Apr 30 '19 at 13:43
0

Have you tried something like

# create first grouping (leave heat columns same)
j = (df.groupby(['id','country','state', 'heat_stress_score', 'heat_stress_level'], as_index=False)
             .apply(lambda x: x[['cold_stress_score','cold_stress_level']].to_dict('r'))
             .reset_index()
             .rename(columns={0:'cold_stress'}))
# care about heat grouping
j = (j.groupby(['id','country','state', 'cold_stress'], as_index=False)
             .apply(lambda x: x[['heat_stress_score','heat_stress_level']].to_dict('r'))
             .reset_index()
             .rename(columns={0:'heat_stress'})
             .to_json(orient='records'))

Sparky05
  • 4,692
  • 1
  • 10
  • 27
  • No its not working it only returning `tide-Data` second one is not coming – Subba Taniparti Apr 29 '19 at 20:04
  • I just checked the documentation of `rename` [see here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html) and adjusted the answer. – Sparky05 Apr 29 '19 at 20:06
  • Seems its not working, the code snip i shared is just example i need to add heat_stress as second one but the exaple works for one which is tide-data , i wanted to add tide-data1 with different col names – Subba Taniparti Apr 29 '19 at 20:15
  • Modified the answer to reflect your changes to your OP. If you would supply a minimal example (including creation of 10 rows of your data frame), it would be a lot easier to help you. – Sparky05 Apr 30 '19 at 07:44
  • after first groupby it acting as list after adding second one it returning error TypeError: unhashable type: 'list' – Subba Taniparti Apr 30 '19 at 13:41
  • Can you correct the Answer Still I cant able to solve it Same error TypeError: unhashable type: 'list' – Subba Taniparti May 22 '19 at 20:25