6

I have a json file of 11gb and I'm unable to load it in pandas. (Source: http://jmcauley.ucsd.edu/data/amazon/) Metadata within the above link is the file that I'm using.

Metadata: Metadata includes descriptions, price, sales-rank, brand info, and co-purchasing links:

It has the following pattern -

{ "asin": "0000031852", "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "related": { "also_bought": ["B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", "0000031909", "B00613WDTQ", "B00D0WDS9A", "B00D0GCI8S", "0000031895", "B003AVKOP2", "B003AVEU6G", "B003IEDM9Q", "B002R0FA24", "B00D23MC6W", "B00D2K0PA0", "B00538F5OK", "B00CEV86I6", "B002R0FABA", "B00D10CLVW", "B003AVNY6I", "B002GZGI4E", "B001T9NUFS", "B002R0F7FE", "B00E1YRI4C", "B008UBQZKU", "B00D103F8U", "B007R2RM8W"], "also_viewed": ["B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", "B00D23MC6W", "B00AFDOPDA", "B00E1YRI4C", "B002GZGI4E", "B003AVKOP2", "B00D9C1WBM", "B00CEV8366", "B00CEUX0D8", "B0079ME3KU", "B00CEUWY8K", "B004FOEEHC", "0000031895", "B00BC4GY9Y", "B003XRKA7A", "B00K18LKX2", "B00EM7KAG6", "B00AMQ17JA", "B00D9C32NI", "B002C3Y6WG", "B00JLL4L5Y", "B003AVNY6I", "B008UBQZKU", "B00D0WDS9A", "B00613WDTQ", "B00538F5OK", "B005C4Y4F6", "B004LHZ1NY", "B00CPHX76U", "B00CEUWUZC", "B00IJVASUE", "B00GOR07RE", "B00J2GTM0W", "B00JHNSNSM", "B003IEDM9Q", "B00CYBU84G", "B008VV8NSQ", "B00CYBULSO", "B00I2UHSZA", "B005F50FXC", "B007LCQI3S", "B00DP68AVW", "B009RXWNSI", "B003AVEU6G", "B00HSOJB9M", "B00EHAGZNA", "B0046W9T8C", "B00E79VW6Q", "B00D10CLVW", "B00B0AVO54", "B00E95LC8Q", "B00GOR92SO", "B007ZN5Y56", "B00AL2569W", "B00B608000", "B008F0SMUC", "B00BFXLZ8M"], "bought_together": ["B002BZX8Z6"] }, "salesRank": {"Toys & Games": 211836}, "brand": "Coxlures", "categories": [["Sports & Outdoors", "Other Sports", "Dance"]] }

On the other hand the 1st 10 rows have the following data:

["{'asin': '0001048791', 'salesRank': {'Books': 6334800}, 'imUrl': 'http://ecx.images-amazon.com/images/I/51MKP0T4DBL.jpg', 'categories': [['Books']], 'title': 'The Crucible: Performed by Stuart Pankin, Jerome Dempsey & Cast'}\n", "{'asin': '0000143561', 'categories': [['Movies & TV', 'Movies']], 'description': '3Pack DVD set - Italian Classics, Parties and Holidays.', 'title': 'Everyday Italian (with Giada de Laurentiis), Volume 1 (3 Pack): Italian Classics, Parties, Holidays', 'price': 12.99, 'salesRank': {'Movies & TV': 376041}, 'imUrl': 'http://g-ecx.images-amazon.com/images/G/01/x-site/icons/no-img-sm._CB192198896_.gif', 'related': {'also_viewed': ['B0036FO6SI', 'B000KL8ODE', '000014357X', 'B0037718RC', 'B002I5GNVU', 'B000RBU4BM'], 'buy_after_viewing': ['B0036FO6SI', 'B000KL8ODE', '000014357X', 'B0037718RC']}}\n", "{'asin': '0000037214', 'related': {'also_viewed': ['B00JO8II76', 'B00DGN4R1Q', 'B00E1YRI4C']}, 'title': 'Purple Sequin Tiny Dancer Tutu Ballet Dance Fairy Princess Costume Accessory', 'price': 6.99, 'salesRank': {'Clothing': 1233557}, 'imUrl': 'http://ecx.images-amazon.com/images/I/31mCncNuAZL.jpg', 'brand': 'Big Dreams', 'categories': [['Clothing, Shoes & Jewelry', 'Girls'], ['Clothing, Shoes & Jewelry', 'Novelty, Costumes & More', 'Costumes & Accessories', 'More Accessories', 'Kids & Baby']]}\n", "{'asin': '0000032069', 'title': 'Adult Ballet Tutu Cheetah Pink', 'price': 7.89, 'imUrl': 'http://ecx.images-amazon.com/images/I/51EzU6quNML._SX342_.jpg', 'related': {'also_bought': ['0000032050', 'B00D0DJAEG', '0000032042', 'B00D0F450I', 'B00D2JTMS2', 'B00D0FDUAY', 'B00D2JSRFQ', '0000032034', 'B00D0D5F6S', 'B00D2JRWWA', 'B00D0FIIJM', 'B00D0FCQQI', 'B00EXVN9PU', 'B0041EOTJO', 'B004PYEE8G', 'B001GTKPDQ', 'B00EON0SJ2', 'B005HMHOQ4', 'B002XZMGGQ'], 'also_viewed': ['B00D0F450I', '0000032050', 'B00D2JTMS2', '0000032042', 'B004PYEE8G', 'B00JHNSNSM', 'B00D0DJAEG', 'B00D2JSRFQ', 'B00D0FCQQI', 'B00D2JRWWA', 'B003AVNY6I', 'B0071KR2LC', 'B00GOR07RE', 'B00D0FIIJM', 'B005F50FXC', 'B0079MCIMU', 'B00D0FDUAY', 'B00H3RYN3I', 'B005C4Y4F6', 'B007IEFT84', 'B00D0D5F6S', 'B002BZX8Z6', 'B00JHONN1S', 'B008F0SU0Y', 'B00FNNFXAG', 'B007R2RM8W', 'B007VM3AMK', 'B00C0PLENA', 'B00BJGG6VG', 'B00E1YRI4C', 'B00IIK61WA', 'B009UC638W', 'B00KZN6RVI', 'B00CSFEENY', 'B002GZGI4E', 'B00HSOJJ94', 'B00LIPP4VG', 'B009RXWNSI', 'B00E87F196', 'B005HMHOQY', 'B00J6S9MSS', '0000032034', 'B00CJQGNJK', 'B008FCA0F0', 'B0056LG7GY', 'B00DPQWCZ2', 'B00I3PV0US', 'B00KZN6IVW', 'B0054TBWKO', 'B00I2S01I8', 'B00BXF12P8', 'B00GVHU678', 'B005NWENGC', 'B003AVKOP2', 'B00JK8MQ4Q', 'B00FZIMVQS', 'B008BB19VE', 'B00GTEXPOE', 'B009WPT2RQ', 'B00E37SBBG'], 'bought_together': ['0000032050', 'B00D0DJAEG', '0000032042', 'B00D0F450I']}, 'brand': 'BubuBibi', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance', 'Clothing', 'Girls', 'Skirts']]}\n", "{'asin': '0000031909', 'related': {'also_bought': ['B002BZX8Z6', 'B00JHONN1S', '0000031895', 'B00D2K1M3O', '0000031852', 'B00D0WDS9A', 'B00D10CLVW', 'B00D103F8U', 'B003AVEU6G', 'B00D2K0PA0', 'B002GZGI4E', 'B00D0ZF44Y', 'B008F0SMUC', 'B00D0GCI8S', 'B008F0SU0Y', 'B002YSCPZY', '0448408775', 'B002R0FABA', 'B008GHWNWC', 'B002R0FA24', 'B001GTKPEK', 'B006XA7KZO', 'B001GZUQ9S', 'B00613VNL0', 'B003IEDM9Q', 'B003LTOZK8', 'B003AVNY6I', 'B008UBQZKU', 'B001AQD8VQ', 'B003ILA0L2', 'B00AFDOPDA', 'B002R0F7FE'], 'also_viewed': ['B002BZX8Z6', 'B00JHONN1S', 'B008F0SU0Y', 'B00E1YRI4C', 'B00AFDOPDA', 'B002GZGI4E', 'B00CEUWY8K', 'B003IEDM9Q', 'B00HSOJB9M', '0000031895', 'B003AVKOP2', 'B005C4Y4F6', 'B008F0SMUC', 'B00362QGW0', 'B008UD01L2', 'B00FAZ5ZE6', 'B008F0SY6O', 'B00DPLLQR2', 'B00CEUWUZC', 'B004PYEE8G', 'B003AVNY6I', 'B00CEUX0D8', 'B00JHNSNSM', 'B00D10CLVW', 'B00D23MC6W', 'B007XAI53E', 'B008X6CBS2', 'B004PEI45U', 'B005HMHOQ4', 'B002C3Y6WG', 'B00HSC8O74', 'B008BMGHM4', 'B00CEUWTFS', 'B00FNNFXAG', 'B00CYBU84G', 'B00D9C32NI', 'B0046W9T8C', 'B008UBG5IW', 'B001YHX45G', 'B00CEV8366', 'B00I2UHSZA', 'B009RXWNSI', 'B008FCA0F0', 'B001GTKPEK', 'B004TU1VPU', 'B00CBPIO7S', 'B00CHHXJ0M', 'B00538F5OK', 'B005F50FXC', 'B00CEUX4QQ', 'B003XRKA7A', '0000031852', 'B002C3R5XI', 'B00C6Q1Z6E'], 'bought_together': ['B002BZX8Z6']}, 'title': 'Girls Ballet Tutu Neon Pink', 'price': 7.0, 'salesRank': {'Toys & Games': 201847}, 'imUrl': 'http://ecx.images-amazon.com/images/I/41xBoP0FVzL._SY300_.jpg', 'brand': 'Unknown', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance']], 'description': 'High quality 3 layer ballet tutu. 12 inches in length'}\n", "{'asin': '0000032034', 'title': 'Adult Ballet Tutu Yellow', 'price': 7.87, 'imUrl': 'http://ecx.images-amazon.com/images/I/21GNUNIa1CL.jpg', 'related': {'also_bought': ['B00D2JSRFQ', '0000032042', '0000032050', 'B00D2JTMS2', 'B00D0FDUAY', 'B00D0FIIJM', 'B00D2JRWWA', 'B00D0F450I', 'B00D0FCQQI', 'B00H3RYN3I', 'B002I55DT8', 'B00498HUQ6', 'B001YZCF1M', 'B00FNNFXAG', 'B00EON0SJ2', 'B000J09OV2', 'B0048WRX5G', 'B00I2EOG92', 'B003UM99FC', 'B00D0DJAEG', '0000032069', 'B00I2S01I8', 'B003AVKOP2', 'B003CPDAUW', 'B005HMHOQ4', 'B00JHONN1S', 'B00GOR07RE', 'B007TMMVJA', 'B00DPPRW2G', 'B0089ND408', 'B0046W9T8C', 'B005HMHOQE', 'B00EOOR812', 'B00CLZWXYI', 'B008AU29UQ', 'B00BNRKT6E', 'B004YHFSIO', 'B00EB5WN9Q', 'B008UBQZKU', 'B00D0D5F6S', 'B004PYEE8G', 'B00FQU9ZUA', 'B008AABRPO', 'B007BZ5CUA', 'B00I5SCG7E', 'B0036LOTNO', 'B009WPT2SA', 'B009QVCTTY', 'B00KZN5J8U', 'B008B81LN8', 'B00E1YRI4C', 'B004SVOVSE', 'B002I4ZJ1Q', 'B005AZMN3C', 'B00BBQFGWO', 'B009QVQZ3K', 'B005C4Y4F6', 'B008CLA6HG', 'B0085D9V1S', 'B000M55BDY', 'B00FE9DIHO', 'B009QVWIQ8', 'B00LIPP114', 'B001YHX45G', 'B00BN6S8WC', 'B009MDB6FE', 'B007AK1KTS', 'B00J6LZ16M', 'B002FRPE9I', 'B002RHLXKU', 'B006F404KQ', 'B00362OQQI', 'B003UNHJ4Y', 'B00D10CLVW', 'B002BZX8Z6', 'B0041EOTJO', 'B00F3KZUPC', 'B0055A1F4A', 'B0035BGVYU', 'B000P18LZ0', 'B007F2H4PU', 'B004XHVUE6', 'B00KF54D6W', 'B0097B1D8G', 'B00840TWES', 'B0050GAHKC', 'B00I9JSUO2', 'B003HCYEQY', 'B0075CNY7M', 'B00AFDOPDA', 'B008FCA0F0', 'B000IRG356', 'B00DSVBR7S', 'B00DYIQ8E2', 'B0041BVA80', 'B009M7FWBE'], 'also_viewed': ['B00D2JSRFQ', '0000032050', 'B00JHNSNSM', '0000032042', 'B00D2JTMS2', 'B003AVKOP2', 'B004YHFSIO', 'B00GOR07RE', 'B00D0FDUAY', 'B004PYEE8G', 'B00D0FCQQI', 'B009WPT2SA', 'B003AVNY6I', 'B00EON0SJ2', 'B005C4Y4F6', 'B00D0FIIJM', 'B00FNNFXAG', 'B00D0F450I', 'B00D2JRWWA', 'B008F0SU0Y', 'B00JHONN1S', 'B00FE9DIHO', 'B0071KR2LC', 'B00H3RYN3I', 'B00IIK61WA', 'B00D0DJAEG', 'B00KZN6RVI', 'B0054TBWKO', 'B00GEDG8D0', 'B00JMX4CCS', 'B00K18LJ6U', 'B0079MCIMU', 'B005HMHOQY', 'B009RXWNSI', 'B007XAI4LW', 'B007IEFTO8', 'B00E1YRI4C', 'B007R2RM8W', 'B002BZX8Z6', 'B00IIK61UW', 'B008F0SMUC', 'B00KF54D6W', 'B00E1Q66BG', 'B003WFSLBA', 'B00IJVASUE', 'B00DPPRW2G', 'B00HAVJ48G', 'B002C3Y6WG', 'B00I5RLL4Y', 'B003AVEU6G', 'B00E95LC8Q', 'B005F50FXC', 'B002U03H1M', 'B00E87F196', 'B008A7TFK6', 'B00HSOJB9M', 'B008A7TFGK', 'B00DPYOB2Q', '0375851682', 'B00CSFEENY'], 'bought_together': ['B00D2JSRFQ', '0000032050', '0000032042', 'B00D2JTMS2']}, 'brand': 'BubuBibi', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance', 'Clothing', 'Girls', 'Skirts']]}\n", '{\'asin\': \'0000589012\', \'title\': "Why Don\'t They Just Quit? DVD Roundtable Discussion: What Families and Friends need to Know About Addiction and Recovery", \'price\': 15.95, \'imUrl\': \'http://ecx.images-amazon.com/images/I/519%2B1kseM3L._SY300_.jpg\', \'related\': {\'also_bought\': [\'B000Z3N1HQ\', \'0578045427\', \'B007VI5AQ8\', \'B003AC98V2\', \'B004V4RW8O\', \'B000I0QL7I\', \'B000J10F8C\', \'B0007CEXYK\', \'B000ERVK4Y\', \'B000XSKDBA\', \'B002UNMWTC\', \'B00008MTXI\', \'B007TSV4GK\', \'B0052ADP6Y\', \'B00EUENWIY\', \'B003YKYX9M\', \'B004RD3YFE\', \'B007Y9F6RW\', \'B00004UEDQ\', \'B0039Y774Q\', \'B0006IIKRG\', \'B00JAGF9HE\', \'6305162026\', \'6305692572\', \'B001D7T460\', \'B0018QOIWG\', \'B002Y7ZELW\', \'B0045HCJ08\', \'0830907394\', \'B000LAZDPG\', \'B00A2H9QN8\', \'B001O5CLXY\', \'B000JBXXYK\', \'B003B3NGS6\', \'B0037SR3N4\', \'B00641Y2ZS\', \'0470903953\', \'0977977315\', \'B00049QQHI\', \'B000E6ESU8\', \'0470402741\', \'061565732X\', \'0615763146\', \'B000VZPTH8\', \'B003JO6OPO\', \'B00787BTEO\', \'B004R1Q7YQ\', \'B001GG6GKK\', \'B0015VQAZM\', \'1592854869\', \'B000QRIL08\', \'B000GQLA8O\', \'B000MPM3TE\', \'0979021804\', \'1608823407\', \'159285821X\', \'B00005Q4CS\', \'B0000549B1\', \'6305594333\', \'B00AFEXRME\', \'B004FN25AG\', \'0830906363\', \'0470402768\', \'1118414756\', \'B009SV4O2M\', \'1481106694\', \'1572306254\', \'B0013MOLPO\', \'B00009Y3QI\', \'B003NMOL2U\', \'B001AKBI8C\', \'0981708803\', \'1572306394\', \'B00B9LNPA6\', \'B005BYBZEK\', \'B004D7SBMU\', \'B00CQMADIO\', \'0470405511\', \'B00CHEHHT4\', \'B000ESUWY2\', \'0792838068\', \'B00AWE09Z0\', \'B00E4XZZEK\', \'0830914870\', \'B00GFZLEF4\', \'083090459X\', \'1402218443\', \'1893007170\', \'1893277046\', \'B005CKI7H6\', \'B0001LQL6K\', \'B000067S10\', \'0890425558\', \'B00114KYC8\', \'1466221224\', \'0943158508\', \'B00A7ID5BG\', \'0671765582\', \'B000B8IH10\', \'1568381395\'], \'buy_after_viewing\': [\'B003AC98V2\', \'B007VI5AQ8\', \'B000ERVK4Y\', \'B0007CEXYK\']}, \'salesRank\': {\'Movies & TV\': 1084845}, \'categories\': [[\'Movies & TV\', \'Movies\']]}\n', "{'asin': '0001048775', 'description': 'William Shakespeare is widely regarded as the greatest playwright the world has seen. He produced an astonishing amount of work; 37 plays, 154 sonnets, and 5 poems. He died on 23rd April 1616, aged 52, and was buried in the Holy Trinity Church, Stratford.', 'title': 'Measure for Measure: Complete & Unabridged', 'imUrl': 'http://ecx.images-amazon.com/images/I/5166EBHDQYL.jpg', 'salesRank': {'Books': 13243226}, 'categories': [['Books']]}\n", "{'asin': '0000031852', 'related': {'also_bought': ['B00JHONN1S', 'B002BZX8Z6', 'B00D2K1M3O', '0000031909', 'B00613WDTQ', 'B00D0WDS9A', 'B00D0GCI8S', '0000031895', 'B003AVKOP2', 'B003AVEU6G', 'B003IEDM9Q', 'B002R0FA24', 'B00D23MC6W', 'B00D2K0PA0', 'B00538F5OK', 'B00CEV86I6', 'B002R0FABA', 'B00D10CLVW', 'B003AVNY6I', 'B002GZGI4E', 'B001T9NUFS', 'B002R0F7FE', 'B00E1YRI4C', 'B008UBQZKU', 'B00D103F8U', 'B007R2RM8W'], 'also_viewed': ['B002BZX8Z6', 'B00JHONN1S', 'B008F0SU0Y', 'B00D23MC6W', 'B00AFDOPDA', 'B00E1YRI4C', 'B002GZGI4E', 'B003AVKOP2', 'B00D9C1WBM', 'B00CEV8366', 'B00CEUX0D8', 'B0079ME3KU', 'B00CEUWY8K', 'B004FOEEHC', '0000031895', 'B00BC4GY9Y', 'B003XRKA7A', 'B00K18LKX2', 'B00EM7KAG6', 'B00AMQ17JA', 'B00D9C32NI', 'B002C3Y6WG', 'B00JLL4L5Y', 'B003AVNY6I', 'B008UBQZKU', 'B00D0WDS9A', 'B00613WDTQ', 'B00538F5OK', 'B005C4Y4F6', 'B004LHZ1NY', 'B00CPHX76U', 'B00CEUWUZC', 'B00IJVASUE', 'B00GOR07RE', 'B00J2GTM0W', 'B00JHNSNSM', 'B003IEDM9Q', 'B00CYBU84G', 'B008VV8NSQ', 'B00CYBULSO', 'B00I2UHSZA', 'B005F50FXC', 'B007LCQI3S', 'B00DP68AVW', 'B009RXWNSI', 'B003AVEU6G', 'B00HSOJB9M', 'B00EHAGZNA', 'B0046W9T8C', 'B00E79VW6Q', 'B00D10CLVW', 'B00B0AVO54', 'B00E95LC8Q', 'B00GOR92SO', 'B007ZN5Y56', 'B00AL2569W', 'B00B608000', 'B008F0SMUC', 'B00BFXLZ8M'], 'bought_together': ['B002BZX8Z6']}, 'title': 'Girls Ballet Tutu Zebra Hot Pink', 'price': 3.17, 'salesRank': {'Toys & Games': 211836}, 'imUrl': 'http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg', 'brand': 'Coxlures', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance']], 'description': 'TUtu'}\n", '{\'asin\': \'0001048236\', \'categories\': [[\'Books\']], \'description\': ""One thing is certain, Sherlockians, put aside your Baring-GouldAnnotated, your Folio SocietyIllustrated-for the time being, the Oxford is the edition to curl up with on a winter\'s night"--The Chicago Tribune"An incomparable gift book; or, should you find it impossible to surrender up such treasures, the best of gifts to oneself"--USA Today"To the true Sherlockian, this will be a treasure; to otherwise diverted detective story fans, it is a rich lode for discovery"--Denver Post"The complete and authentic adventures of the legendary detective--expertly edited and annotated by a team of Holmes scholars....in a handsome, boxed set....A lovely gift"--The Christian Science Monitor"Here in nine volumes...are all the adventures of Holmes and Watson. Each book has an introduction, something new and fascinating for even the most devoted Holmesians plus a series of intelligent notes at the back of each volume."--Oxford Times"TheOxford Sherlock Holmes, a new edition of the stories, is a splendid piece of publishing. Nine compact volumes, beautifully produced, each with a stimulating introduction; clear type, accurate texts, a handy chronology, a helpful bibliography. And, most valuable of all, explanatory notes running to 50 pages or more per volume." --John Gross, writing inSunday Telegraph--This text refers to thePaperbackedition.", \'title\': \'The Sherlock Holmes Audio Collection\', \'price\': 9.26, \'salesRank\': {\'Books\': 8973864}, \'imUrl\': \'http://ecx.images-amazon.com/images/I/51DH145C5JL.jpg\', \'related\': {\'also_viewed\': [\'1442300191\', \'9626349786\', \'1602837155\', \'1598879162\', \'1400115159\', \'1478396202\', \'1408426250\', \'B007PM2A4A\', \'1609980603\'], \'buy_after_viewing\': [\'0312089457\']}}\n']

How can I write a python script to load the json file in pandas?

Anshul Goyal
  • 73,278
  • 37
  • 149
  • 186
jason
  • 3,932
  • 11
  • 52
  • 123
  • 2
    Possible duplicate of ["Large data" work flows using pandas](https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas) – Simon O'Doherty Dec 08 '17 at 06:18
  • 10+ GB of JSON are pretty ugly since the format requires a stateful parser and has to keep stuff in memory. You will basically need a ton of memory. On the other hand the format is not fully clear. I could interpret it as "one JSON object per line" which would allow to parse every line separately. – Klaus D. Dec 08 '17 at 06:39
  • Fortunately, I don't have to work with files that huge, but I've heard that `pandas.HDFStore.put` may resolve your problem. – Tom Wojcik Dec 10 '17 at 19:40
  • Why not incorporate mysql and pandas https://stackoverflow.com/questions/15278375/importing-larger-sql-files-into-mysql –  Dec 11 '17 at 04:43
  • The first ten rows don't seem to be valid json (strings with `'` rather than `"`), is this the raw first ten lines or did you do something to get this? Could you include the raw first lines? – Andy Hayden Dec 11 '17 at 07:02
  • This looks like a network operation. Do you want to save and work with the file or take it as a piece over the network? If your file does not have a byte summary, **it is impossible**. – dsgdfg Dec 13 '17 at 21:29

3 Answers3

11

I didn't really understand the problem. Was it a memory consumption or a file format issue? What errors occurred while using pd.read_json() ?

If it's a file format issue (e.g. raised ValueError: Trailing data):

Based on the source link you posted, it seems that the file is NOT an ordinary json file (see "Reading the data" section). It's a line-delimited json file (each line is a json object) a widely used format for json streaming.

To read line-delimited json files, pass lines=True to pd.read_json():

pd.read_json('path/to/file',lines=True)

Note that you need Pandas ver. 0.19.0+ to use this.

Qusai Alothman
  • 1,982
  • 9
  • 23
  • How can I load first 10,000 json rows using this? – jason Dec 15 '17 at 00:19
  • The Q is that it was taking a long time or hanging my system when I ran the query read_json. I wanted to sneak peek into the data and then start modelling. I had no idea how I could approach it. – jason Dec 15 '17 at 00:20
  • @jason You can load the first 10,000 rows by passing another parameter: `chunksize=10000` . In that case, the `pd.read_json()` will return a reader rather than a DataFrame, which you can iterate to get the data in "chunks" of size 10,000 (each chunk is a new DataFrame). See [here](http://pandas.pydata.org/pandas-docs/stable/io.html#io-jsonl) for an example. Note that for this one you need Pandas ver. 0.21.0+ . – Qusai Alothman Dec 15 '17 at 16:17
2

For your original data, you can directly import it into pandas DataFrame, once read into a dict:

json_data = ['{"asin": "0000031852", "categories": [["Sports & Outdoors", "Other Sports", "Dance"]], "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "salesRank": {"Toys & Games": 211836}, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "brand": "Coxlures", "related": {"also_bought": ["B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", "0000031909", "B00613WDTQ", "B00D0WDS9A", "B00D0GCI8S", "0000031895", "B003AVKOP2", "B003AVEU6G", "B003IEDM9Q", "B002R0FA24", "B00D23MC6W", "B00D2K0PA0", "B00538F5OK", "B00CEV86I6", "B002R0FABA", "B00D10CLVW", "B003AVNY6I", "B002GZGI4E", "B001T9NUFS", "B002R0F7FE", "B00E1YRI4C", "B008UBQZKU", "B00D103F8U", "B007R2RM8W"], "also_viewed": ["B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", "B00D23MC6W", "B00AFDOPDA", "B00E1YRI4C", "B002GZGI4E", "B003AVKOP2", "B00D9C1WBM", "B00CEV8366", "B00CEUX0D8", "B0079ME3KU", "B00CEUWY8K", "B004FOEEHC", "0000031895", "B00BC4GY9Y", "B003XRKA7A", "B00K18LKX2", "B00EM7KAG6", "B00AMQ17JA", "B00D9C32NI", "B002C3Y6WG", "B00JLL4L5Y", "B003AVNY6I", "B008UBQZKU", "B00D0WDS9A", "B00613WDTQ", "B00538F5OK", "B005C4Y4F6", "B004LHZ1NY", "B00CPHX76U", "B00CEUWUZC", "B00IJVASUE", "B00GOR07RE", "B00J2GTM0W", "B00JHNSNSM", "B003IEDM9Q", "B00CYBU84G", "B008VV8NSQ", "B00CYBULSO", "B00I2UHSZA", "B005F50FXC", "B007LCQI3S", "B00DP68AVW", "B009RXWNSI", "B003AVEU6G", "B00HSOJB9M", "B00EHAGZNA", "B0046W9T8C", "B00E79VW6Q", "B00D10CLVW", "B00B0AVO54", "B00E95LC8Q", "B00GOR92SO", "B007ZN5Y56", "B00AL2569W", "B00B608000", "B008F0SMUC", "B00BFXLZ8M"], "bought_together": ["B002BZX8Z6"]}}']

import json

data = [json.loads(d) for d in json_data]

# data now looks like below, as shared in post
data = [{ "asin": "0000031852", "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "related": { "also_bought": ["B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", "0000031909", "B00613WDTQ", "B00D0WDS9A", "B00D0GCI8S", "0000031895", "B003AVKOP2", "B003AVEU6G", "B003IEDM9Q", "B002R0FA24", "B00D23MC6W", "B00D2K0PA0", "B00538F5OK", "B00CEV86I6", "B002R0FABA", "B00D10CLVW", "B003AVNY6I", "B002GZGI4E", "B001T9NUFS", "B002R0F7FE", "B00E1YRI4C", "B008UBQZKU", "B00D103F8U", "B007R2RM8W"], "also_viewed": ["B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", "B00D23MC6W", "B00AFDOPDA", "B00E1YRI4C", "B002GZGI4E", "B003AVKOP2", "B00D9C1WBM", "B00CEV8366", "B00CEUX0D8", "B0079ME3KU", "B00CEUWY8K", "B004FOEEHC", "0000031895", "B00BC4GY9Y", "B003XRKA7A", "B00K18LKX2", "B00EM7KAG6", "B00AMQ17JA", "B00D9C32NI", "B002C3Y6WG", "B00JLL4L5Y", "B003AVNY6I", "B008UBQZKU", "B00D0WDS9A", "B00613WDTQ", "B00538F5OK", "B005C4Y4F6", "B004LHZ1NY", "B00CPHX76U", "B00CEUWUZC", "B00IJVASUE", "B00GOR07RE", "B00J2GTM0W", "B00JHNSNSM", "B003IEDM9Q", "B00CYBU84G", "B008VV8NSQ", "B00CYBULSO", "B00I2UHSZA", "B005F50FXC", "B007LCQI3S", "B00DP68AVW", "B009RXWNSI", "B003AVEU6G", "B00HSOJB9M", "B00EHAGZNA", "B0046W9T8C", "B00E79VW6Q", "B00D10CLVW", "B00B0AVO54", "B00E95LC8Q", "B00GOR92SO", "B007ZN5Y56", "B00AL2569W", "B00B608000", "B008F0SMUC", "B00BFXLZ8M"], "bought_together": ["B002BZX8Z6"] }, "salesRank": {"Toys & Games": 211836}, "brand": "Coxlures", "categories": [["Sports & Outdoors", "Other Sports", "Dance"]] }]

import pandas

pandas.DataFrame(data)

However, the data that you have attached in your extended example is not Json, but a string representation of a dict. So, you will need to do a literal_eval to get the actual data:

data = ["{'asin': '0001048791', 'salesRank': {'Books': 6334800}, 'imUrl': 'http://ecx.images-amazon.com/images/I/51MKP0T4DBL.jpg', 'categories': [['Books']], 'title': 'The Crucible: Performed by Stuart Pankin, Jerome Dempsey & Cast'}\n", "{'asin': '0000143561', 'categories': [['Movies & TV', 'Movies']], 'description': '3Pack DVD set - Italian Classics, Parties and Holidays.', 'title': 'Everyday Italian (with Giada de Laurentiis), Volume 1 (3 Pack): Italian Classics, Parties, Holidays', 'price': 12.99, 'salesRank': {'Movies & TV': 376041}, 'imUrl': 'http://g-ecx.images-amazon.com/images/G/01/x-site/icons/no-img-sm._CB192198896_.gif', 'related': {'also_viewed': ['B0036FO6SI', 'B000KL8ODE', '000014357X', 'B0037718RC', 'B002I5GNVU', 'B000RBU4BM'], 'buy_after_viewing': ['B0036FO6SI', 'B000KL8ODE', '000014357X', 'B0037718RC']}}\n", "{'asin': '0000037214', 'related': {'also_viewed': ['B00JO8II76', 'B00DGN4R1Q', 'B00E1YRI4C']}, 'title': 'Purple Sequin Tiny Dancer Tutu Ballet Dance Fairy Princess Costume Accessory', 'price': 6.99, 'salesRank': {'Clothing': 1233557}, 'imUrl': 'http://ecx.images-amazon.com/images/I/31mCncNuAZL.jpg', 'brand': 'Big Dreams', 'categories': [['Clothing, Shoes & Jewelry', 'Girls'], ['Clothing, Shoes & Jewelry', 'Novelty, Costumes & More', 'Costumes & Accessories', 'More Accessories', 'Kids & Baby']]}\n", "{'asin': '0000032069', 'title': 'Adult Ballet Tutu Cheetah Pink', 'price': 7.89, 'imUrl': 'http://ecx.images-amazon.com/images/I/51EzU6quNML._SX342_.jpg', 'related': {'also_bought': ['0000032050', 'B00D0DJAEG', '0000032042', 'B00D0F450I', 'B00D2JTMS2', 'B00D0FDUAY', 'B00D2JSRFQ', '0000032034', 'B00D0D5F6S', 'B00D2JRWWA', 'B00D0FIIJM', 'B00D0FCQQI', 'B00EXVN9PU', 'B0041EOTJO', 'B004PYEE8G', 'B001GTKPDQ', 'B00EON0SJ2', 'B005HMHOQ4', 'B002XZMGGQ'], 'also_viewed': ['B00D0F450I', '0000032050', 'B00D2JTMS2', '0000032042', 'B004PYEE8G', 'B00JHNSNSM', 'B00D0DJAEG', 'B00D2JSRFQ', 'B00D0FCQQI', 'B00D2JRWWA', 'B003AVNY6I', 'B0071KR2LC', 'B00GOR07RE', 'B00D0FIIJM', 'B005F50FXC', 'B0079MCIMU', 'B00D0FDUAY', 'B00H3RYN3I', 'B005C4Y4F6', 'B007IEFT84', 'B00D0D5F6S', 'B002BZX8Z6', 'B00JHONN1S', 'B008F0SU0Y', 'B00FNNFXAG', 'B007R2RM8W', 'B007VM3AMK', 'B00C0PLENA', 'B00BJGG6VG', 'B00E1YRI4C', 'B00IIK61WA', 'B009UC638W', 'B00KZN6RVI', 'B00CSFEENY', 'B002GZGI4E', 'B00HSOJJ94', 'B00LIPP4VG', 'B009RXWNSI', 'B00E87F196', 'B005HMHOQY', 'B00J6S9MSS', '0000032034', 'B00CJQGNJK', 'B008FCA0F0', 'B0056LG7GY', 'B00DPQWCZ2', 'B00I3PV0US', 'B00KZN6IVW', 'B0054TBWKO', 'B00I2S01I8', 'B00BXF12P8', 'B00GVHU678', 'B005NWENGC', 'B003AVKOP2', 'B00JK8MQ4Q', 'B00FZIMVQS', 'B008BB19VE', 'B00GTEXPOE', 'B009WPT2RQ', 'B00E37SBBG'], 'bought_together': ['0000032050', 'B00D0DJAEG', '0000032042', 'B00D0F450I']}, 'brand': 'BubuBibi', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance', 'Clothing', 'Girls', 'Skirts']]}\n", "{'asin': '0000031909', 'related': {'also_bought': ['B002BZX8Z6', 'B00JHONN1S', '0000031895', 'B00D2K1M3O', '0000031852', 'B00D0WDS9A', 'B00D10CLVW', 'B00D103F8U', 'B003AVEU6G', 'B00D2K0PA0', 'B002GZGI4E', 'B00D0ZF44Y', 'B008F0SMUC', 'B00D0GCI8S', 'B008F0SU0Y', 'B002YSCPZY', '0448408775', 'B002R0FABA', 'B008GHWNWC', 'B002R0FA24', 'B001GTKPEK', 'B006XA7KZO', 'B001GZUQ9S', 'B00613VNL0', 'B003IEDM9Q', 'B003LTOZK8', 'B003AVNY6I', 'B008UBQZKU', 'B001AQD8VQ', 'B003ILA0L2', 'B00AFDOPDA', 'B002R0F7FE'], 'also_viewed': ['B002BZX8Z6', 'B00JHONN1S', 'B008F0SU0Y', 'B00E1YRI4C', 'B00AFDOPDA', 'B002GZGI4E', 'B00CEUWY8K', 'B003IEDM9Q', 'B00HSOJB9M', '0000031895', 'B003AVKOP2', 'B005C4Y4F6', 'B008F0SMUC', 'B00362QGW0', 'B008UD01L2', 'B00FAZ5ZE6', 'B008F0SY6O', 'B00DPLLQR2', 'B00CEUWUZC', 'B004PYEE8G', 'B003AVNY6I', 'B00CEUX0D8', 'B00JHNSNSM', 'B00D10CLVW', 'B00D23MC6W', 'B007XAI53E', 'B008X6CBS2', 'B004PEI45U', 'B005HMHOQ4', 'B002C3Y6WG', 'B00HSC8O74', 'B008BMGHM4', 'B00CEUWTFS', 'B00FNNFXAG', 'B00CYBU84G', 'B00D9C32NI', 'B0046W9T8C', 'B008UBG5IW', 'B001YHX45G', 'B00CEV8366', 'B00I2UHSZA', 'B009RXWNSI', 'B008FCA0F0', 'B001GTKPEK', 'B004TU1VPU', 'B00CBPIO7S', 'B00CHHXJ0M', 'B00538F5OK', 'B005F50FXC', 'B00CEUX4QQ', 'B003XRKA7A', '0000031852', 'B002C3R5XI', 'B00C6Q1Z6E'], 'bought_together': ['B002BZX8Z6']}, 'title': 'Girls Ballet Tutu Neon Pink', 'price': 7.0, 'salesRank': {'Toys & Games': 201847}, 'imUrl': 'http://ecx.images-amazon.com/images/I/41xBoP0FVzL._SY300_.jpg', 'brand': 'Unknown', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance']], 'description': 'High quality 3 layer ballet tutu. 12 inches in length'}\n", "{'asin': '0000032034', 'title': 'Adult Ballet Tutu Yellow', 'price': 7.87, 'imUrl': 'http://ecx.images-amazon.com/images/I/21GNUNIa1CL.jpg', 'related': {'also_bought': ['B00D2JSRFQ', '0000032042', '0000032050', 'B00D2JTMS2', 'B00D0FDUAY', 'B00D0FIIJM', 'B00D2JRWWA', 'B00D0F450I', 'B00D0FCQQI', 'B00H3RYN3I', 'B002I55DT8', 'B00498HUQ6', 'B001YZCF1M', 'B00FNNFXAG', 'B00EON0SJ2', 'B000J09OV2', 'B0048WRX5G', 'B00I2EOG92', 'B003UM99FC', 'B00D0DJAEG', '0000032069', 'B00I2S01I8', 'B003AVKOP2', 'B003CPDAUW', 'B005HMHOQ4', 'B00JHONN1S', 'B00GOR07RE', 'B007TMMVJA', 'B00DPPRW2G', 'B0089ND408', 'B0046W9T8C', 'B005HMHOQE', 'B00EOOR812', 'B00CLZWXYI', 'B008AU29UQ', 'B00BNRKT6E', 'B004YHFSIO', 'B00EB5WN9Q', 'B008UBQZKU', 'B00D0D5F6S', 'B004PYEE8G', 'B00FQU9ZUA', 'B008AABRPO', 'B007BZ5CUA', 'B00I5SCG7E', 'B0036LOTNO', 'B009WPT2SA', 'B009QVCTTY', 'B00KZN5J8U', 'B008B81LN8', 'B00E1YRI4C', 'B004SVOVSE', 'B002I4ZJ1Q', 'B005AZMN3C', 'B00BBQFGWO', 'B009QVQZ3K', 'B005C4Y4F6', 'B008CLA6HG', 'B0085D9V1S', 'B000M55BDY', 'B00FE9DIHO', 'B009QVWIQ8', 'B00LIPP114', 'B001YHX45G', 'B00BN6S8WC', 'B009MDB6FE', 'B007AK1KTS', 'B00J6LZ16M', 'B002FRPE9I', 'B002RHLXKU', 'B006F404KQ', 'B00362OQQI', 'B003UNHJ4Y', 'B00D10CLVW', 'B002BZX8Z6', 'B0041EOTJO', 'B00F3KZUPC', 'B0055A1F4A', 'B0035BGVYU', 'B000P18LZ0', 'B007F2H4PU', 'B004XHVUE6', 'B00KF54D6W', 'B0097B1D8G', 'B00840TWES', 'B0050GAHKC', 'B00I9JSUO2', 'B003HCYEQY', 'B0075CNY7M', 'B00AFDOPDA', 'B008FCA0F0', 'B000IRG356', 'B00DSVBR7S', 'B00DYIQ8E2', 'B0041BVA80', 'B009M7FWBE'], 'also_viewed': ['B00D2JSRFQ', '0000032050', 'B00JHNSNSM', '0000032042', 'B00D2JTMS2', 'B003AVKOP2', 'B004YHFSIO', 'B00GOR07RE', 'B00D0FDUAY', 'B004PYEE8G', 'B00D0FCQQI', 'B009WPT2SA', 'B003AVNY6I', 'B00EON0SJ2', 'B005C4Y4F6', 'B00D0FIIJM', 'B00FNNFXAG', 'B00D0F450I', 'B00D2JRWWA', 'B008F0SU0Y', 'B00JHONN1S', 'B00FE9DIHO', 'B0071KR2LC', 'B00H3RYN3I', 'B00IIK61WA', 'B00D0DJAEG', 'B00KZN6RVI', 'B0054TBWKO', 'B00GEDG8D0', 'B00JMX4CCS', 'B00K18LJ6U', 'B0079MCIMU', 'B005HMHOQY', 'B009RXWNSI', 'B007XAI4LW', 'B007IEFTO8', 'B00E1YRI4C', 'B007R2RM8W', 'B002BZX8Z6', 'B00IIK61UW', 'B008F0SMUC', 'B00KF54D6W', 'B00E1Q66BG', 'B003WFSLBA', 'B00IJVASUE', 'B00DPPRW2G', 'B00HAVJ48G', 'B002C3Y6WG', 'B00I5RLL4Y', 'B003AVEU6G', 'B00E95LC8Q', 'B005F50FXC', 'B002U03H1M', 'B00E87F196', 'B008A7TFK6', 'B00HSOJB9M', 'B008A7TFGK', 'B00DPYOB2Q', '0375851682', 'B00CSFEENY'], 'bought_together': ['B00D2JSRFQ', '0000032050', '0000032042', 'B00D2JTMS2']}, 'brand': 'BubuBibi', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance', 'Clothing', 'Girls', 'Skirts']]}\n", '{\'asin\': \'0000589012\', \'title\': "Why Don\'t They Just Quit? DVD Roundtable Discussion: What Families and Friends need to Know About Addiction and Recovery", \'price\': 15.95, \'imUrl\': \'http://ecx.images-amazon.com/images/I/519%2B1kseM3L._SY300_.jpg\', \'related\': {\'also_bought\': [\'B000Z3N1HQ\', \'0578045427\', \'B007VI5AQ8\', \'B003AC98V2\', \'B004V4RW8O\', \'B000I0QL7I\', \'B000J10F8C\', \'B0007CEXYK\', \'B000ERVK4Y\', \'B000XSKDBA\', \'B002UNMWTC\', \'B00008MTXI\', \'B007TSV4GK\', \'B0052ADP6Y\', \'B00EUENWIY\', \'B003YKYX9M\', \'B004RD3YFE\', \'B007Y9F6RW\', \'B00004UEDQ\', \'B0039Y774Q\', \'B0006IIKRG\', \'B00JAGF9HE\', \'6305162026\', \'6305692572\', \'B001D7T460\', \'B0018QOIWG\', \'B002Y7ZELW\', \'B0045HCJ08\', \'0830907394\', \'B000LAZDPG\', \'B00A2H9QN8\', \'B001O5CLXY\', \'B000JBXXYK\', \'B003B3NGS6\', \'B0037SR3N4\', \'B00641Y2ZS\', \'0470903953\', \'0977977315\', \'B00049QQHI\', \'B000E6ESU8\', \'0470402741\', \'061565732X\', \'0615763146\', \'B000VZPTH8\', \'B003JO6OPO\', \'B00787BTEO\', \'B004R1Q7YQ\', \'B001GG6GKK\', \'B0015VQAZM\', \'1592854869\', \'B000QRIL08\', \'B000GQLA8O\', \'B000MPM3TE\', \'0979021804\', \'1608823407\', \'159285821X\', \'B00005Q4CS\', \'B0000549B1\', \'6305594333\', \'B00AFEXRME\', \'B004FN25AG\', \'0830906363\', \'0470402768\', \'1118414756\', \'B009SV4O2M\', \'1481106694\', \'1572306254\', \'B0013MOLPO\', \'B00009Y3QI\', \'B003NMOL2U\', \'B001AKBI8C\', \'0981708803\', \'1572306394\', \'B00B9LNPA6\', \'B005BYBZEK\', \'B004D7SBMU\', \'B00CQMADIO\', \'0470405511\', \'B00CHEHHT4\', \'B000ESUWY2\', \'0792838068\', \'B00AWE09Z0\', \'B00E4XZZEK\', \'0830914870\', \'B00GFZLEF4\', \'083090459X\', \'1402218443\', \'1893007170\', \'1893277046\', \'B005CKI7H6\', \'B0001LQL6K\', \'B000067S10\', \'0890425558\', \'B00114KYC8\', \'1466221224\', \'0943158508\', \'B00A7ID5BG\', \'0671765582\', \'B000B8IH10\', \'1568381395\'], \'buy_after_viewing\': [\'B003AC98V2\', \'B007VI5AQ8\', \'B000ERVK4Y\', \'B0007CEXYK\']}, \'salesRank\': {\'Movies & TV\': 1084845}, \'categories\': [[\'Movies & TV\', \'Movies\']]}\n', "{'asin': '0001048775', 'description': 'William Shakespeare is widely regarded as the greatest playwright the world has seen. He produced an astonishing amount of work; 37 plays, 154 sonnets, and 5 poems. He died on 23rd April 1616, aged 52, and was buried in the Holy Trinity Church, Stratford.', 'title': 'Measure for Measure: Complete & Unabridged', 'imUrl': 'http://ecx.images-amazon.com/images/I/5166EBHDQYL.jpg', 'salesRank': {'Books': 13243226}, 'categories': [['Books']]}\n", "{'asin': '0000031852', 'related': {'also_bought': ['B00JHONN1S', 'B002BZX8Z6', 'B00D2K1M3O', '0000031909', 'B00613WDTQ', 'B00D0WDS9A', 'B00D0GCI8S', '0000031895', 'B003AVKOP2', 'B003AVEU6G', 'B003IEDM9Q', 'B002R0FA24', 'B00D23MC6W', 'B00D2K0PA0', 'B00538F5OK', 'B00CEV86I6', 'B002R0FABA', 'B00D10CLVW', 'B003AVNY6I', 'B002GZGI4E', 'B001T9NUFS', 'B002R0F7FE', 'B00E1YRI4C', 'B008UBQZKU', 'B00D103F8U', 'B007R2RM8W'], 'also_viewed': ['B002BZX8Z6', 'B00JHONN1S', 'B008F0SU0Y', 'B00D23MC6W', 'B00AFDOPDA', 'B00E1YRI4C', 'B002GZGI4E', 'B003AVKOP2', 'B00D9C1WBM', 'B00CEV8366', 'B00CEUX0D8', 'B0079ME3KU', 'B00CEUWY8K', 'B004FOEEHC', '0000031895', 'B00BC4GY9Y', 'B003XRKA7A', 'B00K18LKX2', 'B00EM7KAG6', 'B00AMQ17JA', 'B00D9C32NI', 'B002C3Y6WG', 'B00JLL4L5Y', 'B003AVNY6I', 'B008UBQZKU', 'B00D0WDS9A', 'B00613WDTQ', 'B00538F5OK', 'B005C4Y4F6', 'B004LHZ1NY', 'B00CPHX76U', 'B00CEUWUZC', 'B00IJVASUE', 'B00GOR07RE', 'B00J2GTM0W', 'B00JHNSNSM', 'B003IEDM9Q', 'B00CYBU84G', 'B008VV8NSQ', 'B00CYBULSO', 'B00I2UHSZA', 'B005F50FXC', 'B007LCQI3S', 'B00DP68AVW', 'B009RXWNSI', 'B003AVEU6G', 'B00HSOJB9M', 'B00EHAGZNA', 'B0046W9T8C', 'B00E79VW6Q', 'B00D10CLVW', 'B00B0AVO54', 'B00E95LC8Q', 'B00GOR92SO', 'B007ZN5Y56', 'B00AL2569W', 'B00B608000', 'B008F0SMUC', 'B00BFXLZ8M'], 'bought_together': ['B002BZX8Z6']}, 'title': 'Girls Ballet Tutu Zebra Hot Pink', 'price': 3.17, 'salesRank': {'Toys & Games': 211836}, 'imUrl': 'http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg', 'brand': 'Coxlures', 'categories': [['Sports & Outdoors', 'Other Sports', 'Dance']], 'description': 'TUtu'}\n", '{\'asin\': \'0001048236\', \'categories\': [[\'Books\']], \'description\': ""One thing is certain, Sherlockians, put aside your Baring-GouldAnnotated, your Folio SocietyIllustrated-for the time being, the Oxford is the edition to curl up with on a winter\'s night"--The Chicago Tribune"An incomparable gift book; or, should you find it impossible to surrender up such treasures, the best of gifts to oneself"--USA Today"To the true Sherlockian, this will be a treasure; to otherwise diverted detective story fans, it is a rich lode for discovery"--Denver Post"The complete and authentic adventures of the legendary detective--expertly edited and annotated by a team of Holmes scholars....in a handsome, boxed set....A lovely gift"--The Christian Science Monitor"Here in nine volumes...are all the adventures of Holmes and Watson. Each book has an introduction, something new and fascinating for even the most devoted Holmesians plus a series of intelligent notes at the back of each volume."--Oxford Times"TheOxford Sherlock Holmes, a new edition of the stories, is a splendid piece of publishing. Nine compact volumes, beautifully produced, each with a stimulating introduction; clear type, accurate texts, a handy chronology, a helpful bibliography. And, most valuable of all, explanatory notes running to 50 pages or more per volume." --John Gross, writing inSunday Telegraph--This text refers to thePaperbackedition.", \'title\': \'The Sherlock Holmes Audio Collection\', \'price\': 9.26, \'salesRank\': {\'Books\': 8973864}, \'imUrl\': \'http://ecx.images-amazon.com/images/I/51DH145C5JL.jpg\', \'related\': {\'also_viewed\': [\'1442300191\', \'9626349786\', \'1602837155\', \'1598879162\', \'1400115159\', \'1478396202\', \'1408426250\', \'B007PM2A4A\', \'1609980603\'], \'buy_after_viewing\': [\'0312089457\']}}\n']

from ast import literal_eval

new_data = [literal_eval(x) for x in data]

import pandas

pandas.DataFrame(new_data)
Anshul Goyal
  • 73,278
  • 37
  • 149
  • 186
0

Sometimes, its better to decide -

  1. Memory intensive: Further operations are dependent on entire loading of json ?

  2. Runtime data gen: Do we require just to emit the required object/objects from the json one by one?

If our decision is memory intensive operations, then its not necessary to keep the entire file in memory. We need to go for intermediate cache ex. lru. This will keep the memory usage limited to required.

If our decision is runtime data generation, then ijson is perfect solution.

Ex. Source: https://pypi.python.org/pypi/ijson

import ijson
f = urlopen('http://.../')
objects = ijson.items(f, 'earth.europe.item')
cities = (o for o in objects if o['type'] == 'city')
for city in cities:
   do_something_with(city)

File or url to load, not a issue.

dhirajforyou
  • 432
  • 1
  • 4
  • 11