1

I am using dcast function to rshape datframe in R, but while using large dataframe. I converted that into ffdf dataframe unable to use dcast function please help me if any alternatives. Find the below example i used for small dataframe and what i want to do for ffdf dataframe:

- hdsample <- read.csv("C:/Users/PK5016573/Desktop/hdsample.csv")
- View(hdsample)


hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

This is working but:

hhp<-read.ffdf("C:/Users/PK5016573/Desktop/hdsample.csv")

hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

This gives me error please help

thanks in advance pavan kancharala

Naga Pavan
  • 11
  • 3
  • 4
    Please provide a reproducible example. – akrun Dec 17 '14 at 12:08
  • Hi akrun please downlad data from the url:http://www.heritagehealthprize.com/c/hhp/data after downloading sort it in excel take only two MemberID data try first example after that take all the data and try the second code u will find the error – Naga Pavan Dec 17 '14 at 12:29
  • Is it `HHP_release1`? – akrun Dec 17 '14 at 12:42
  • ya claims dataset HHP_release3 – Naga Pavan Dec 17 '14 at 12:44
  • Use ffdfdply from package ffbase and inside the FUN, apply dcast.data.table. Similar example shown here which uses reshape inside FUN. http://stackoverflow.com/questions/21472459/functions-for-creating-and-reshaping-big-data-in-r-using-the-ff-package/21478168#21478168 –  Dec 17 '14 at 21:04
  • Hi @jwijffels I tried it but not working can u give detail explination please thanks :) – Naga Pavan Dec 18 '14 at 07:27
  • 2
    The objective of stackoverflow is that you provide a reproducible example and that others can help you where you are stuck. Not the other way around. –  Dec 18 '14 at 14:05

1 Answers1

0

I got answer for this question but it may not work largely factored data

# Reshape_function to process on data
   # Reshaping data as per year and Primary condition group
    library(reshape2)
    library(ffbase)
    reshapefunction<-function(x){
    df=dcast(x,MemberID~ Year+PrimaryConditionGroup,
    value.var= "rep.x..each...2668990.",              
    fun.aggregate = sum)
    }
    # Reshaping data using reshape_function 
    # Specifying size of chunks to process the data
    PrimaryConditionGroup<-ffdfdply(x=hhp,split=hhp$MemberID
    ,FUN = function(x) reshapefunction(x),BATCHBYTES = 100000000,trace=TRUE)

View(PrimaryConditionGroup)

All the data was taken from kaggle competition added one more column "rep.x..each...2668990." which contains 1 in every row used for aggregation purpose

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Naga Pavan
  • 11
  • 3