7

I have a pandas dataframe which contains thousands of rows, and a few columns. I am getting an error when trying to convert it to a json file.

This is the code to convert:

sessionAttendance.to_json('SessionAttendance.json')

This is the error I'm getting:

OverflowError: Maximum recursion level reached


                             _id       wondeID  session               updatedAt
0       123456789101112131415161  AA1234567891        AM 2019-06-21 08:05:50.845
1       123456789101112131415162  AA1234567892        AM 2019-06-21 08:05:50.845
2       123456789101112131415163  AA1234567893        AM 2019-06-21 08:05:50.845
3       123456789101112131415164  AA1234567894        AM 2019-06-21 08:05:50.845


[234195 rows x 4 columns]
FlyingTeller
  • 17,638
  • 3
  • 38
  • 53
  • If you provided a [mcve] with a few lines of data, maybe we could try to reproduce... – Serge Ballesta Mar 02 '20 at 13:52
  • It will not work. Eric (and I) suspect cyclic dependencies. We need all the columns, and need know what they really contain. For example if one column contains a list (not the string of a list representation but a true list), you shall say it, and say how it is built. – Serge Ballesta Mar 02 '20 at 14:04
  • @SergeBallesta They are all objects –  Mar 02 '20 at 14:42
  • Hmm, building a json from 234195 rows , seem to be resource consuming. How much memory has you system, and what are you OS and Python versions? – Serge Ballesta Mar 02 '20 at 14:44
  • Python 3.7, 16 GB RAM, 64 bit OS x 64 based processer –  Mar 02 '20 at 14:45
  • I cannot reproduce on my Windows 10 box... What happen is you drastically reduce the number of rows: `sessionAttendance.iloc[:100].to_json('SessionAttendance.json')` (only first 100 lines) – Serge Ballesta Mar 02 '20 at 14:53
  • @SergeBallesta I got this error - OverflowError: Overlong 2 byte UTF-8 sequence detected when encoding string –  Mar 02 '20 at 14:54
  • 1
    Could the `_id` column be Mongo related? I have just found https://stackoverflow.com/a/14567504/3545273. What gives `sessionAttendance.iloc[:100].to_json('SessionAttendance.json', default_handler=str)`? – Serge Ballesta Mar 02 '20 at 15:06
  • Yes it must be Mongo related, as that is where I'm getting the data from - sessionAttendance.iloc[:100].to_json('SessionAttendance.json', default_handler=str) - didn't produce any errors –  Mar 02 '20 at 15:08
  • No error is good, but *didn't produce anything* is weird. It should have at least produced a file somewhere... – Serge Ballesta Mar 02 '20 at 15:10
  • @SergeBallesta Yes - sorry it did produce a file, so I think you fixed it. Thanks so much –  Mar 02 '20 at 15:13

3 Answers3

12

It seems to be related to the way Mongo formats its _id fields which are not correctly processed by the json module. A workaround is to set default_handler=str to force the json formatter to use a string representation for any unwanted type:

sessionAttendance.to_json('SessionAttendance.json', default_handler=str)

Disclaimer: credit should be given to that other SO post

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

I found same error when i using Ray.

The data like this:

{'id': 100, 'tags': ['tag1', 'tag2']}
{'id': 100, 'tags': ['tag1', 'tag2']}

when I use ds.write_json('path-path') crash the error.

So, i fix the error using tuple or list when call the map.


ds = ray.data.read_json('source.jsonl')

def process_records(records,):
    for i in records['id']:
        records['tags'][i] = tuple(records['tags'][i])

    return records

...
ds.map_batches(process_records)
ds.select_columns(['tags']).write_json('cleaned/', orient='records', lines=True, force_ascii=False, default_handler=str)

zzzz zzzz
  • 307
  • 2
  • 13
-2
import sys
sys.setrecursionlimit(1500) # this number can be any limit

If it's a table the code above should fix it. If your pandas data frame itself has columns which are objects, you might need to make sure that there are no cyclic dependencies in the object

https://github.com/pandas-dev/pandas/issues/4873

It could be related to the issue posted above. To get past it, first convert your date time column to a string

df['updatedAt'] = df['updatedAt'].dt.strftime('%Y-%m-%d %H:%M:%S')

The converting it to a json should work.

Eric Yang
  • 2,678
  • 1
  • 12
  • 18