I have a large CSV file that we will be using to import assets into our asset management database. Here is a smaller example for the CSV data.
Serial number,Movement type,Posting date
2LMXK1,101,1/5/15 9:00
2LMXK1,102,1/5/15 9:30
2LMXK1,201,1/5/15 10:30
2LMXK1,202,1/5/15 13:00
2LMXK1,301,1/5/15 14:00
JEMLP3,101,1/6/15 9:00
JEMLP3,102,1/7/15 10:00
JEMLP3,201,1/7/15 13:30
JEMLP3,202,1/7/15 15:30
JEMLP3,203,1/7/15 17:30
BR83GP,101,1/5/15 9:00
BR83GP,102,1/5/15 13:00
BR83GP,201,1/6/15 9:00
BR83GP,202,1/7/15 15:30
BR83GP,301,1/5/15 13:00
BR83GP,201,1/6/15 9:00
BR83GP,301,1/9/15 15:30
I need to parse this CSV and return a JSON object of only the latest movement type for each unique serial number.
I have a script successfully achieves: * Parse the CSV * Sort by Date and group by Serial number, get the latest date * Convert panda dataframe to JSON object (but missing serialnumber)
ISSUE: The "serial number" column is omitted when converting the dataframe to a JSON object. I think the problem is the fact that "serial number" is used as the groupby value. I tried the builtin to_json but that did not return the data in the correct format.
The data frame contains the correct filtered data that I need as we can see in print(last_movement).
How can I create a JSON object and obtain all of the columns in the dataframe?
#Import python modules
import ujson as json
import pandas as pd
import numpy as np
#CSV parse to panda dataframe
pdata = pd.read_csv('import.csv', skipinitialspace=True, parse_dates=[2])
#Sort by posting date to get rows by latest posting date/time only
last_movement = pdata.sort_values('Posting date').groupby('Serial number').last()
print(last_movement)
# RETURNS
# We know the dataframe is correct
#
# Movement type Posting date
#Serial number
#2LMXK1 301 2015-01-05 14:00:00
#BR83GP 301 2015-01-09 15:30:00
#JEMLP3 203 2015-01-07 17:30:00
out = last_movement.to_json()
print(out)
#RETURNS a JSON object that is aggregated by serial number
# {"Movement type":{"2LMXK1":301,"BR83GP":301,"JEMLP3":203},"Posting date":{"2LMXK1":1420466400000,"BR83GP":1420817400000,"JEMLP3":1420651800000}}
Here is the output when I tried a custom function to iterate the values and convert the dataframe to JSON object. Although this is a little bit better, it still does not have the serial number. It appears as though the "groupby" aggregation is causing some issues with the serial number column. Perhaps I need to somehow "ungroup" the resulting dataframe so I have my filtered data and can convert it to a JSON object.
#Convert panda dataframe to json object
def tojson(df):
d = [
dict([
(colname, row[i])
for i,colname in enumerate(df.columns)
])
for row in df.values
]
return json.dumps(d)
out = tojson(last_movement)
print(out)
# RETURNS
# MISSING SERIAL NUMBER
# [{"Posting date":1420466400,"Movement type":301},{"Posting date":1420817400,"Movement type":301},{"Posting date":1420651800,"Movement type":203}]