5

I am participating in the Yelp Dataset Challenge and I'm using RethinkDB to store the JSON documents for each of the different datasets.

I have the following script:

import rethinkdb as r
import json, os

RDB_HOST =  os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
DB = 'test'

connection = r.connect(host=RDB_HOST, port=RDB_PORT, db=DB)

query = r.table('yelp_user').filter({"name":"Arthur"}).run(connection)
print(query)

But when I run it at the terminal in a virtualenv I get this as an example response:

<rethinkdb.net.DefaultCursor object at 0x102c22250> (streaming):
[{'yelping_since': '2014-03', 'votes': {'cool': 1, 'useful': 2, 'funny': 1}, 'review_count': 5, 'id': '08eb0b0d-2633-4ec4-93fe-817a496d4b52', 'user_id': 'ZuDUSyT4bE6sx-1MzYd2Kg', 'compliments': {}, 'friends': [], 'average_stars': 5, 'type': 'user', 'elite': [], 'name': 'Arthur', 'fans': 0}, ...]

I know I can use pprint to pretty print outputs but a bigger issue that I don't understand how to resolve is just printing them in an intelligent manner, like not just showing "..." as the end of the output.

Any suggestions?

Arthur Collé
  • 2,541
  • 5
  • 27
  • 39

2 Answers2

4

run returns an iterable cursor. Iterate over it to get all the rows:

query = r.table('yelp_user').filter({"name":"Arthur"})
for row in query.run(connection):
    print(row)
Etienne Laurin
  • 6,731
  • 2
  • 27
  • 31
  • What is cursor though? Is it a default python type? Where should I look to instruct myself in order to maximize my ability to understand and manipulate these types? – Arthur Collé Apr 20 '15 at 03:57
  • 1
    These cursors are part of the rethinkdb module. The API is documented here: http://www.rethinkdb.com/api/python/ . Specifically, `next`, `for`, `list` and `close`. – Etienne Laurin Apr 20 '15 at 23:58
0

Another way is to convert rethinkdb.net.DefaultCursor (or Cursor) into a pandas DataFrame

As seen on documentation (https://rethinkdb.com/api/python/to_array), the Cursor can be transformed into a list, and then to a DataFrame by simply calling:

pd.DataFrame(list(r.db('YOUR-DB').table('YOUR-TABLE').run()))

Although it breaks some of NO-SQL DB logic, since pandas is basead on structured data, it is still a good way to vizualize data