Background
I'm importing a JSON file with a schema like the following:
{
"000001": {
"USA": {
"requests": 1234,
"RTT": 40
},
"Japan": {
"requests": 10,
"RTT": 200
}
},
"000002": {
"USA": {
"requests": 4321,
"RTT": 50
},
"Japan": {
"requests": 10,
"RTT": 150
}
}
}
After importing, I have two things I wish to do with this data:
- Perform a lot of analysis and math, which NumPy would be great at
- Dump the data to a database
Upon dumping it to the database, I expect a schema like the following:
+--------+---------+----------+-------+
| time* | country*| requests | RTT |
+--------+---------+----------+-------+
| 000001 | USA | 1234 | 40ms |
| 000001 | Japan | 10 | 200ms |
| 000002 | USA | 4321 | 50ms |
| 000002 | Japan | 20 | 150ms |
| ... | ... | ... | ... |
Problem
Currently to import the data, I'm doing the following:
>>>import numpy as np
>>>import flatdict as fd
>>>np.array(fd.FlatDict(json_string))
array([[000001, u'USA', u'requests', 1234],
[000001, u'USA', u'RTT', 40],
[000001, u'Japan', u'requests', 10],
...,
[000002, u'USA', u'RTT', 50],
[000002, u'Japan', u'requests', 20],
[000002, u'Japan', u'RTT', 150]],
dtype='<U31')
The problem here is that this creates a unique row for each value, rather than for each key. I'm willing to write my own function to replace flatdict
if necessary, but the question is: can NumPy have tuples / ararys / objects for values? Ideally I would want an output like the following:
>>>np.array(data)
array([[000001, u'USA', (1234, 40)],
[000001, u'Japan', (10, 200)],
[000002, u'USA', (4321, 50)],
[000002, u'Japan', (20, 150)]],
dtype='...')
However I'd be willing to accept:
>>>np.array(data)
array([[000001, u'USA', 1234, 40],
[000001, u'Japan', 10, 200],
[000002, u'USA', 4321, 50],
[000002, u'Japan', 20, 150]],
dtype='...')
Alternate Acceptable Solution
For step one (performing all of the analysis) I can use the NumPy array I have now. Something like array[:,"USA","requests"]
returns the number of requests in USA for all time.
It's for step two that I need to get it into a different format. If there's a way to take:
array([[000001, u'USA', u'requests', 1234],
[000001, u'USA', u'RTT', 40],
[000001, u'Japan', u'requests', 10],
...,
[000002, u'USA', u'RTT', 50],
[000002, u'Japan', u'requests', 20],
[000002, u'Japan', u'RTT', 150]],
dtype='<U31')
and get it into the form:
[[000001, u'USA', 1234, 40],
[000001, u'Japan', 10, 200],
[000002, u'USA', 4321, 50],
[000002, u'Japan', 20, 150]]
for the sake of dumping it into a database, that will be fine, too