2

Please let me know how to convert below format json file to data frame.

json file data:

{
"series_id":"STEO",
"f":"A",
"data":[["2018",5.8400705041],["2017",3.5671511014],["2016",2.3014617486],["2015",2.4989178082],["2014",2.2089452055]]
}

I have tried below code:

sourcePath = r'D:\source\STEO.txt'
data = pd.read_json(sourcePath, lines=True)

I need below output from above json:

series_id   f   date    value
STEO        A   2018    5.840070504
STEO        A   2017    3.567151101
STEO        A   2016    2.301461749
STEO        A   2015    2.498917808
STEO        A   2014    2.208945206
Learnings
  • 2,780
  • 9
  • 35
  • 55

2 Answers2

1

One way may be as following:

Reading Json:

import pandas as pd
df = pd.read_json('input.txt')
print(df)

Output:

                   data  f series_id
0  [2018, 5.8400705041]  A      STEO
1  [2017, 3.5671511014]  A      STEO
2  [2016, 2.3014617486]  A      STEO
3  [2015, 2.4989178082]  A      STEO
4  [2014, 2.2089452055]  A      STEO

Splitting

# splitting into multiple columns for list
# https://stackoverflow.com/a/35491399/5916727
df[['Date','Value']] = pd.DataFrame([item for item in df.data])
# removing initial data column now
del df['data']
print(df)

Output:

   f series_id  Date     Value
0  A      STEO  2018  5.840071
1  A      STEO  2017  3.567151
2  A      STEO  2016  2.301462
3  A      STEO  2015  2.498918
4  A      STEO  2014  2.208945
niraj
  • 17,498
  • 4
  • 33
  • 48
  • Traceback (most recent call last): File "D:\Sunil_Work\psnl\python\temPY\15Jul17_fred3.py", line 10, in data[['Date','Value']] = pd.DataFrame([item for item in data.data]) File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 314, in __init__ arrays, columns = _to_arrays(data, columns, dtype=dtype) TypeError: object of type 'float' has no len() – Learnings Jul 16 '17 at 11:58
  • @faithon I think there is something wrong with your data. What happens when your data has only float instead of having `list` of `Date and Value`? You probably will get same error. `"data":[["2018",5.8400705041],2.3014617486]`. You can see `2.3014617486` is not `list` instead is `float`. – niraj Jul 16 '17 at 13:30
1

You can use read_json asnd then pop for drop column data and create new columns by DataFrame constructor with converting to values:

df = pd.read_json('file.json')
df[['date','value']] = pd.DataFrame(df.pop('data').values.tolist())
#if necessary convert to int
df['date'] = df['date'].astype(int)
print (df)
   f series_id  date     value
0  A      STEO  2018  5.840071
1  A      STEO  2017  3.567151
2  A      STEO  2016  2.301462
3  A      STEO  2015  2.498918
4  A      STEO  2014  2.208945

Another solution:

You can use json_normalize, then rename columns and if necessary reorder them by reindex_axis:

from pandas.io.json import json_normalize 
import json

with open('file.json') as data_file:    
    d = json.load(data_file)  

d_cols = {0:'date', 1:'value'}
names_cols = ['series_id','f','date','value']
df = json_normalize(d, 'data', ['f', 'series_id']) \
       .rename(columns=d_cols) \
       .reindex_axis(names_cols, axis=1)
df['date'] = df['date'].astype(int)
print (df)
  series_id  f  date     value
0      STEO  A  2018  5.840071
1      STEO  A  2017  3.567151
2      STEO  A  2016  2.301462
3      STEO  A  2015  2.498918
4      STEO  A  2014  2.208945
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, but its giving error: Traceback (most recent call last): File "D:\Sunil_Work\psnl\python\temPY\15Jul17_fred3.py", line 10, in data[['Date','Value']] = pd.DataFrame([item for item in data.data]) File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 314, in init arrays, columns = _to_arrays(data, columns, dtype=dtype) TypeError: object of type 'float' has no len() – – Learnings Jul 16 '17 at 12:08
  • Hmmm and solution with json_normalize? How does it work? – jezrael Jul 16 '17 at 13:32
  • But if want use first solution, I think you need replace NaN by empty list. So need `df = pd.read_json('file.json') df['data'] = df['data'].combine_first(pd.Series([[]], index=df.index)) df[['date','value']] = pd.DataFrame(df.pop('data').values.tolist())` – jezrael Jul 16 '17 at 13:51