12

Is there a general way of passing a data frame with arbitrary columns (integer/factor, numeric, character data) from r to c and back? Pointers to close enough examples would be greatly appreciated.

Thanks.

RT

user151410
  • 776
  • 9
  • 22

3 Answers3

17

A data.frame is a list, so along the lines of

#include <Rdefines.h>

SEXP df_fun(SEXP df)
{
    int i, len = Rf_length(df);
    SEXP result;
    PROTECT(result = NEW_CHARACTER(len));
    for (i = 0; i < len; ++i)
        switch(TYPEOF(VECTOR_ELT(df, i))) {
        case INTSXP:
            SET_STRING_ELT(result, i, mkChar("integer"));
            break;
        case REALSXP:
            SET_STRING_ELT(result, i, mkChar("numeric"));
            break;
        default:
            SET_STRING_ELT(result, i, mkChar("other"));
            break;
        };
        UNPROTECT(1);
    return result;
}

and then after R CMD SHLIB df_fun.c

> dyn.load("df_fun.so")
> df=data.frame(x=1:5, y=letters[1:5], z=pi, stringsAsFactors=FALSE)
> .Call("df_fun", df)
[1] "integer" "other"   "numeric"

Use GET_CLASS, GET_ATTR and other macros in Rdefines.h (or their equivalent functions, like getAttrib) to discover other information about the data frame. Note though that a data.frame has an API that can differ from its structure. So for instance the R function row.names can return something different from the value stored in the row.names attribute. I think most .Call functions operate on atomic vectors, keeping the manipulation of more complicated objects at the R level.

Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
  • This is exactly what I was looking for. Thanks Martin. What other attributes of a data frame (nrows, rownames, colnames, is.factor etc.) can I query when passing a DF and set when returning a DF ? RT – user151410 Jul 12 '11 at 16:19
  • I added a few sentences about this to the end of the answer – Martin Morgan Jul 12 '11 at 16:42
3

Here's a link to an example using C++ and package inline by Dirk Eddelbeuttel:

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for the quick response. C++ is not an option as I am restricted to c by design. – user151410 Jul 12 '11 at 00:58
  • Why the restriction on C? For R, you are bound to use gcc which means you can use g++ just as easily. Easier APIs trump harder APIs every time. Oh, and the example pointed to by DWin can as easily return a data.frame too. – Dirk Eddelbuettel Jul 12 '11 at 13:23
  • Restriction was a euphemism for my own ignorance! But the reward is worth the effort. Thanks for the package. RT – user151410 Jul 12 '11 at 16:17
1

data.frame type is a list with "data.frame" attribute.

This is example of creating data.frame in source of R (src/library/stats/src/model.c):

/* Turn the data "list" into a "data.frame" */
/* so that subsetting methods will work. */

PROTECT(tmp = mkString("data.frame"));
setAttrib(data, R_ClassSymbol, tmp);
UNPROTECT(1);

You can simulate data.frame manually this way:

l <- list(1:5)
attr(l, "class") <- "data.frame"
attr(l, "names") <- "Column 1"
attr(l, "row.names") <- paste("Row ", 1:5)
DenisKolodin
  • 13,501
  • 3
  • 62
  • 65