1

I want to query a couchbasedb bucket from R and store the results in a data frame.

I went through this blogpost and tried to replicate the steps in my own cluster using custom query, but got the error message in couchbase logs

Invalid post received: {mochiweb_request,
                       [#Port<0.5548256>,'POST',"/query/service/",
                        {1,1},
                        {6,
                         {"host",
                          {'Host',
                              "[removed]:8091"},
                          {"accept-encoding",
                           {'Accept-Encoding',"gzip, deflate"},
                           {"accept",
                            {'Accept',
                                "application/json, text/xml, application/xml, */*"},
                            nil,nil},
                           {"content-type",
                            {'Content-Type',
                                "application/x-www-form-urlencoded;charset=UTF-8"},
                            {"content-length",
                             {'Content-Length',"59"},
                             nil,nil},
                            nil}},
                          {"user-agent",
                           {'User-Agent',
                               "libcurl/7.54.0 r-curl/2.6 httr/1.2.1"},
                           nil,nil}}}]}

Then I tried to use the reticulate package in R to query couchbasedb using the python SDK.

Python Code:

from couchbase.n1ql import N1QLQuery
from couchbase.bucket import Bucket
import pandas as pd

host = '[host_name]:8091'
bucket = 'my-bucket'
cb = Bucket('couchbase://' + host + '/' + bucket)
query = N1QLQuery('Select * from `my-bucket`')

df = pd.DataFrame()

for row in cb.n1ql_query(query):
    df = df.append(row, ignore_index=True)

The code above works perfectly fine and appends the pandas data frame df with expected values.

Below is my unsuccessful attempt to translate the above python code to R using the reticulate function

R Code:

library(reticulate)

reticulate::use_condaenv("my-env", "/usr/local/anaconda3/bin/conda")

Bucket <- reticulate::import("couchbase.bucket")$Bucket
N1QLQuery <- reticulate::import("couchbase.n1ql")$N1QLQuery
pd <- reticulate::import("pandas", "pd")

host <- '[host_name]:8091'
bucket <- 'my-bucket'

cb <- Bucket(paste0('couchbase://', host, '/', bucket))
query = N1QLQuery('Select * from `my-bucket`')

Up to this point everything works fine.

Now, how can I translate the for loop in python to R that will append query results into the data frame?

for row in cb.n1ql_query(query):
    df = df.append(row, ignore_index=True)

I tried to use the reticulate::iterate(), but it throws an error. Most likely because I'm not using this function correctly.

> reticulate::iterate(cb$n1ql_query(query), print)
Error in reticulate::iterate(cb$n1ql_query(query), print) : 
iterate function called with non-iterator argument

The last resort would be to use rPython package to directly call the python script, but even this doesn't look like a straightforward task.

Any working solution would work. I don't mind how do we get the R data frame.

Help is much appreciated :)

Niket
  • 146
  • 9
  • What does `cb$n1ql_query(query)` return? – arvi1000 Dec 06 '17 at 18:06
  • a `couchbase.n1ql.N1QLRequest` object – Niket Dec 06 '17 at 23:39
  • right but how's that represented in the R environment? as a list? if you post a meaningful excerpt, then it may be clear how to iterate thru it in R natively (or otherwise convert it to a `data.frame`) – arvi1000 Dec 07 '17 at 05:28
  • It's not a list. When you assign it to a variable `x`, `x <- cb$n1ql_query(query)`, it creates exactly the same object as in python. For example, you can `x$get_single_result()` and it returns a list with all the columns and values of the first document in that bucket – Niket Dec 07 '17 at 08:52
  • https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451 – arvi1000 Dec 07 '17 at 15:22

0 Answers0