0

I have a list of dictionaries like this:

  [{'accounts': [{'account_number': 'xx6747',
  'bank_name': 'Fargo Bank',
  'bankid': 85,
  'created_date_y': Timestamp('2015-05-20 08:08:33'),
  'item_account_id': 20086385.0,
  'item_id': 17764259,
  'user_id': 18},
  {'account_number': '*819',
 'bank_name': 'Central Alabama - Bank',
 'bankid': 88,
 'created_date_y': Timestamp('2015-05-20 08:08:33'),
 'item_account_id': 20637127.0,
 'item_id': 17761624,
 'user_id': 18},
 {'account_number': '*056',
'bank_name': 'Central Alabama - Bank',
'bankid': 88,
'created_date_y': Timestamp('2015-05-20 08:08:33'),
'item_account_id': 20087284.0,
'item_id': 17761624,
'user_id': 18},
'created_date_x': Timestamp('2014-03-11 10:48:42'),
'id': 18,
'password': 'NjgrUW9URGZz09',
'status': 1,
'uid': 97560,
'username': 'pepass@yahoo.com'},
{'accounts': [{'account_number': 791,
'bank_name': 'Chase - Bank',
'bankid': 855,
'created_date_y': Timestamp('2015-05-20 08:03:37'),
'item_account_id': 20090104.0,
'item_id': 17764260,
'user_id': 20},
{'account_number': 883,
'bank_name': 'Chase - Bank',
'bankid': 852,
'created_date_y': Timestamp('2015-05-20 08:03:37'),
'item_account_id': 20090105.0,
'item_id': 17764260,
'user_id': 20}],
'created_date_x': Timestamp('2014-03-11 13:42:20'),
'id': 20,
'password': 'bmJlbTN5dFo5QT09',
'status': 1,
'uid': 99408,
'username': 'hor@sanstore.com'}]

It contains other items as well. The created_date_x and created_date_y are timestamps. I want to convert this list into a nested dictionary something like this:

 {'id': 18,
  'password': 'NjgrUW9URGZz09',
  'status': 1,
  'uid': 97560,
  'username': 'pepass@yahoo.com'},
  'created_date_x': Timestamp('2014-03-11 10:48:42')
{'accounts': 
       [{'account_number': 'xx6747',
       'bank_name': 'Fargo Bank',
       'bankid': 85,
       'created_date_y': Timestamp('2015-05-20 08:08:33'),
       'item_account_id': 20086385.0,
       'item_id': 17764259,
       'user_id': 18},
      {'account_number': '*819',
       'bank_name': 'Central Alabama - Bank',
       'bankid': 88,
       'created_date_y': Timestamp('2015-05-20 08:08:33'),
       'item_account_id': 20637127.0,
       'item_id': 17761624,
       'user_id': 18},
      {'account_number': '*056',
       'bank_name': 'Central Alabama - Bank',
       'bankid': 88,
       'created_date_y': Timestamp('2015-05-20 08:08:33'),
       'item_account_id': 20087284.0,
       'item_id': 17761624,
       'user_id': 18}]}}

After converting it into a dictionary, I want to convert this into a JSON. The problem in converting it into json is that the timestamp fields give an error as timestamp is not json serializale. The code I have so far which does not work as I want is:

 import pandas as pd
 #importing files
 df1 = pd.read_excel("C:\\Users\\Desktop\\yod_user.xlsx")
 df2 = pd.read_excel("C:\\Users\\Desktop\\yod_bank.xlsx")

 df1['created_date'] = df1['created_date'].astype(str)
 df2['created_date'] = df2['created_date'].astype(str)

 df1.dtypes
 #id               int64
 #username        object
 #password        object
 #uid              int64
 #created_date    object
 #status           int64
 #dtype: object
 df2.dtypes
 #user_id              int64
 #bankid                      int64
 #account_number             object
 #item_id              int64
 #item_account_id    float64
 #created_date               object
 #bank_name                  object
 #dtype: object

 df_merge = pd.merge(df1, df2, left_on = 'id', right_on ='user_id', how ='inner')
 df_merge.dtypes
 #id                        object
 #username                  object
 #password                  object
 #uid                       object
 #created_date_x            object
 #status                    object
 #user_id                   object
 #bankid                    object
 #account_number            object
 #item_id                   object
 #item_account_id           object
 #created_date_y            object
 #bank_name                 object
 #dtype: object
j = df_merge.groupby(['id', 'username', 'password',   'uid',created_date_x',  'status'], as_index=False)\
         .apply(lambda x: x[['account_number','user_id','bankid', 'item_id', 'item_account_id','created_date_y', 'bank_name' ]].to_dict('r'))\
         .reset_index()\
         .rename(columns={0:'accounts'})\

 j.dtypes
 #id                 int64
 #username          object
 #password          object
 #uid                int64
 #created_date_x    object
 #status             int64
 #accounts          object
 #dtype: object
 jdict = j.to_dict('records')

 import json
 json.dumps(jdict) #This gives an error: TypeError: Object of type     'Timestamp' is not JSON serializable

If the initial list can be converted into a json like a dictionary that will also do. Please Help.

Karan Gupta
  • 529
  • 2
  • 7
  • 21

2 Answers2

0

Using astype() will work. In your code, you're not actually changing the dtype of your date columns.

You have:

 df1['created_date'].astype(str)

You need:

 df1['created_date'] = df1['created_date'].astype(str)

Example:

dates = pd.date_range("2000","2002",freq="A")
data = {"created_date":dates, "value":range(len(dates))}
df = pd.DataFrame(data)

df
  created_date  value
0   2000-12-31      0
1   2001-12-31      1

(df["created_date"].dtype
# datetime64[ns]

df["created_date"] = df["created_date"].astype(str)

df["created_date"].dtype
# object

json.dumps(df.to_dict('records'))
# '[{"created_date": "2000-12-31", "value": 0}, 
    {"created_date": "2001-12-31", "value": 1}]'
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • I did it, but it still gives the same error when I try to dump it as json. How do I check the datatype of it before converting to json – Karan Gupta Nov 14 '17 at 05:56
  • Check with `.dtype`. I added an example. – andrew_reece Nov 14 '17 at 06:01
  • `df1['created_date'] = df1['created_date'].astype(str) df1['created_date'].dtype dtype('O')`. It gives me this but it does not convert to json. – Karan Gupta Nov 14 '17 at 06:04
  • It's not clear what you mean here. Are you saying you're getting the same error about `Timestamp` that you were originally? If so, you're missing the conversion of some other date or time field in your data. Use `df.dtypes` to figure out what's a date, and make sure they're all converted. (Or just do `df.astype(str)`.) – andrew_reece Nov 14 '17 at 06:13
  • Yes, the same error of timestamp not being json serializable. I have all dtypes as int64, object or float. I should convert all into str? – Karan Gupta Nov 14 '17 at 06:16
  • If you're going to end up with a JSON string eventually, then yes, go ahead and convert all to strings. But that's not really addressing the fact that you're missing something in your code still. It doesn't make sense that you'd be getting an error about `Timestamp` if `j.dtypes` has no `Timestamp`. You have additional bugs in your code. I'd recommend updating your original post with your corrected code and printouts of `dtypes` after each new `df` you make. (Fix formatting as well, please.) Then it may be clearer where you're still having a problem. – andrew_reece Nov 14 '17 at 06:22
  • I updated the question. Pardon me with the formatting as I don't know how to do it here. I did the best I could. – Karan Gupta Nov 14 '17 at 06:36
  • I converted all into strings and it worked. But still when I dump into json it gives an error: `IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_data_rate_limit`.` What am I supposed to with this? – Karan Gupta Nov 14 '17 at 06:50
  • Glad to hear you have it working. Must be a big file. Your command is trying to show it all in your display frame. That's a separate issue, solved a few different ways here: https://stackoverflow.com/q/43288550/2799941 – andrew_reece Nov 14 '17 at 06:56
  • I had got this error before also. I couldn't solve it. I tried looking for it at several places but all in vain. I'll look into the link you shared. Thanks anyways :D. – Karan Gupta Nov 14 '17 at 06:58
0

A simple solution to this problem would be to avoid using Timestamp entirely. Instead of calling Timestamp you could instead use the built in datetime module and create a function that returns the actual, numeric timestamp. For example

from datetime import datetime
from pytz import utc 

def timestamp_from_str(date_str):

    date = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S') 
    return date.replace(tzinfo=utc).timestamp()

Note that the .timestamp method is not available in Python 2.7, in which case you will have to calculate the timestamp manually. This is a simple process, but I won't mention it here as it has already been described in depth by another question.

user144153
  • 829
  • 1
  • 12
  • 28
  • `AttributeError: module 'datetime' has no attribute 'strptime'`. I am using python 3 – Karan Gupta Nov 14 '17 at 06:12
  • Did you use `import datetime` or `from datetime import datetime`? The first will throw an error, but the second will work. – user144153 Nov 14 '17 at 06:15
  • Yep, I did it. defined the function and passed `df1['created_date']`. Still, got the error. – Karan Gupta Nov 14 '17 at 06:18
  • sorry - there was a typo with the tzinfo. For both myself and the college to my immediate left, copy and pasting the above definition works for the function call `timestamp_from_str('2015-05-20 08:08:33')`. – user144153 Nov 14 '17 at 06:24
  • It does not work bro. Same error. I am going edit the question with outputs of dtypes and others. – Karan Gupta Nov 14 '17 at 06:27
  • The attribute error you're quoting is well documented and is specifically caused by the wrong import statement, as discussed [here](https://stackoverflow.com/questions/19480028/attributeerror-module-object-has-no-attribute-strptime) – user144153 Nov 14 '17 at 06:54
  • I did it by converting everything to string. I have had the following error before also. Any idea how to fix this: `IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --NotebookApp.iopub_data_rate_limit`. – Karan Gupta Nov 14 '17 at 06:56