2

I am working on an assignment where I am supposed to read in a csv file, then send it to a function to be converted to a class object. I managed to read in the csv file and convert it to an object by doing this:

make_LD <- function(x){
  structure(list(id = c(x$id), visit = c(x$visit),
                 room = c(x$room), value = c(x$value), timepoint = c(x$timepoint)), class = "LongitudinalData")
}

A reproducible version of the input CSV file is:

data <- structure(list(id = c(14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L), 
    visit = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), room = c("bedroom", "bedroom", 
    "bedroom", "bedroom", "bedroom", "bedroom", "bedroom", "bedroom", 
    "bedroom", "bedroom", "bedroom", "bedroom", "bedroom", "bedroom", 
    "bedroom", "bedroom", "bedroom", "bedroom", "bedroom", "bedroom"
    ), value = c(6, 6, 2.75, 2.75, 2.75, 2.75, 6, 6, 2.75, 2.75, 
    2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75, 2.75
    ), timepoint = 53:72), .Names = c("id", "visit", "room", 
"value", "timepoint"), class = "data.frame", row.names = c(NA, 
-20L))

How I am running this code:

## Read in the data
library(readr)
library(magrittr)
library(dplyr)
source("oop_code_2.R")
## Load any other packages that you may need to execute your code

data <- read_csv("data/MIE.csv")
x <- make_LD(data)
out <- subject(x, 14)

Once I have done this, I use the object variable and send it to a generic function:

subject <- function(x, id) UseMethod("subject")
subject.LongitudinalData <- function(x, subj){
  subj_exist <- x %>%
    group_by_(x$id) %>%
    filter(x$id == subj)
  return(subj_exist)
}

When I run the code, it produces an error:

Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "LongitudinalData"

I have noticed the format of the csv files read in is organized into columns, where as the format of the data after I sent it to become an object has changed into fields.

Question is, what I am doin wrong? Thanks!

Edited/Added:

When I run this code on the data from the csv it works without error, as shown below. If this helps.

> datatest1 <- data %>%
+ group_by(id, visit, room) %>%
+ select(id, visit, room , value) %>%
+ filter(id == 14) %>%
+ summarise(valmean = mean(value))
> print(datatest1)
# A tibble: 6 x 4
# Groups:   id, visit [?]
     id visit         room   valmean
  <int> <int>        <chr>     <dbl>
1    14     0      bedroom  4.786592
2    14     0  living room  2.750000
3    14     1      bedroom  3.401442
4    14     1 family  room  8.426549
5    14     2      bedroom 18.583635
6    14     2  living room 22.550694

When done on the LongitudinalData object, it throws an error:

> datatest2 <- x %>%
+ group_by(id, visit, room) %>%
+ select(id, visit, room , value) %>%
+ filter(id == 14) %>%
+ summarise(valmean = mean(values))
Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "LongitudinalData"

It could also be from the way the data is formatted. Here's an example how the data looks before and after the data is converted to the LongitudinalData object.

> head(data)
# A tibble: 6 x 5
     id visit    room value timepoint
  <int> <int>   <chr> <dbl>     <int>
1    14     0 bedroom  6.00        53
2    14     0 bedroom  6.00        54
3    14     0 bedroom  2.75        55
4    14     0 bedroom  2.75        56
5    14     0 bedroom  2.75        57
6    14     0 bedroom  2.75        58
> head(x)
$id
   [1] 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
  [40] 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
  [79] 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
$visit
   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [60] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [119] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Here's a link to the data: data

Len Greski
  • 10,505
  • 2
  • 22
  • 33
afterafx
  • 23
  • 1
  • 1
  • 7
  • What exactly are you trying to do with `group_by_(x$id)`? It's not clear what you are trying to do so it's not clear how you are doing it wrong. You probably don't want the underscored version of the function and you probably want to leave out the `x$` parts. – MrFlick Jan 16 '18 at 21:31
  • My goal is to group all the entries with the same id and eventually get the mean of their values. – afterafx Jan 16 '18 at 21:45
  • It's easier to help if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output. – MrFlick Jan 16 '18 at 21:46
  • Added an example, if that helps. – afterafx Jan 16 '18 at 21:51
  • Please add a reproducible example where you create the data frame or provide a link to a file that we can download ourselves, @afterafx – Kamil Jan 16 '18 at 22:16
  • Sharing a `head()` is not that helpful. Try a `dput()`. But it seems like you want your object to act like a data.frame? If that's the case you need to change it's class. use `class = c("LongitudinalData", "data.frame")` in your `make_LD` (also show how/when you are calling this function) – MrFlick Jan 16 '18 at 22:16
  • I tried showing the `dput()` of the data, but it was too large and page became unresponsive so I have included a link to download the data. I also included how I am calling the function `make_LD`. – afterafx Jan 16 '18 at 22:53
  • Just edited the OP adding a `dput()` with first 20 rows of data. Original data is available in Roger Peng's [Biostat 776](https://github.com/rdpeng/Biostat776/tree/master/homework3) GitHub repository. – Len Greski Jan 16 '18 at 22:58

3 Answers3

1

In setting the class attribute to "LongitudinalData", you're telling R to use only methods for .LongitudinalData. Just like you how you've defined a subject.LongitudinalData that gets called when you execute subject(x, 14), R looks for group_by_.LongitudinalData when you call group_by_, but that, of course, doesn't exist because you just invented the class.

However, R has a simple inheritance-like feature so that you can specify backup classes to try if there is no method for the main class.

From ?class:

When a generic function fun is applied to an object with class attribute c("first", "second"), the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found, a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used (if it exists). If there is no class attribute, the implicit class is tried, then the default method.

Therefore, you can specify that your LongitudinalData object can also be treated like a data frame like so:

make_LD <- function(x){
  structure(list(id = c(x$id), visit = c(x$visit),
                 room = c(x$room), value = c(x$value), timepoint = c(x$timepoint)), 
  class = c("LongitudinalData", "data.frame"))
}

However, there is some additional structure to a data frame that is missing, so it's usually better to create new classes by building upon an existing object rather than creating one from scratch:

make_LD <- function (x) {
  class(x) <- c("LongitudinalData", class(x))
  x
}

Note that there are several additional issues with your subject.LongitudinalData method that need to be corrected before it will function. I suggest reading vignette("programming", package = "dplyr")

W. Murphy
  • 1,121
  • 8
  • 15
  • If I change my function to also be treated like a data frame, when I print out `class(x)`, it will say both LongitudinalData and data frame. Where as my assignment calls for it to only state LongitudinalData. Is this still possible? – afterafx Jan 16 '18 at 23:49
  • Then you have two options: define new methods `group_by_.LongitudinalData`et cetera to teach each dplyr method how to work with your class, or use base functions instead of dplyr. It's a shame your class is teaching bad habits. – W. Murphy Jan 17 '18 at 01:49
  • It seems those are my only options. Your answer worked as far as getting rid of the error. – afterafx Jan 18 '18 at 01:37
0

I read the csv file you added in your comment as the source you use with this way:

#replace you path for csv file as it is in your computer
df <- read.csv("C:/Users/username/Desktop/_257dbf6be13177cd110e3ef91b34ff67_data/data/MIE.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)

and I run you code:

make_LD <- function(x){
  structure(list(id = c(x$id), visit = c(x$visit),
                 room = c(x$room), value = c(x$value), timepoint = c(x$timepoint)), class = "LongitudinalData")
}

subject <- function(x, id) UseMethod("subject")
subject.LongitudinalData <- function(x, subj){
  subj_exist <- x %>%
    group_by_(x$id) %>%
    filter(x$id == subj)
  return(subj_exist)
}

and I don't receive any error.

PitterJe
  • 216
  • 2
  • 12
  • It works when using the data directly, but not when the data is converted to the LongitudinalData object. I included the code above that shows how I am using the data. – afterafx Jan 16 '18 at 22:55
  • @afterafx which package do you use for this? – PitterJe Jan 16 '18 at 22:56
  • using `library(dplyr)` – afterafx Jan 16 '18 at 23:00
  • @afterafx ok. and in the code for datatest2 from where x comes from? Is it the dataframe you have from the reading process from csv? Because from the first block of code it is not available an x – PitterJe Jan 16 '18 at 23:02
  • If you look at the third block of code, x is generated from: `x <- make_LD(data)`. data is what's read in from the csv file. – afterafx Jan 16 '18 at 23:05
0

I was also working on the same assignment and I am using S4 class.

Here is my solution:

library(dplyr)

setClass('longitudinalData', 
         representation = representation(
                        id = "numeric", 
                        visit = "numeric",
                        room = "character",
                        value = "numeric",
                        timepoint = 'numeric')
         )


data = data.frame( id = rbinom(1000, 10, .75),
                   visit = sample(1:3, 1000, replace = TRUE),
                   room = sample(letters[1:5], 1000, replace = TRUE),
                   value = rnorm(1000, 50, 10),
                   timepoint = abs(rnorm(1000))
)

make_LD = function(data){
  new("longitudinalData", 
      id = as.numeric(data$id),
      visit = as.numeric(data$visit), 
      room = as.character(data$room),
      value = as.numeric(data$value),
      timepoint =as.numeric(data$timepoint))
}

x = make_LD (data)
print(x)

setGeneric(name = 'subject', def = function(.Object, n=1){standardGeneric('subject')})

setMethod(f='subject', signature = 'longitudinalData',
          definition = function(.Object, n=1) {
            if(n %in% .Object@id){
            x = data.frame(as.factor(.Object@id), as.factor(.Object@visit), .Object@room, .Object@value, .Object@timepoint)
            names(x) = c( 'id', 'visit', 'room', 'value', 'timepoint')
            out = x[which(x$id == n),] %>% group_by(visit)
            return(out)
            } else { stop(paste("Subject", n, "is not available", sep = " "))}
          })


subject(x, n=4) %>% summary

Note that I have used data.frame inside the setMethod so that I can use the known(effective) dplyr functions on it.

The output is the following:

> subject(x, n=4) %>% summary
       id     visit room      value         timepoint      
 4      :12   1:2   a:2   Min.   :25.04   Min.   :0.02548  
 2      : 0   2:4   b:2   1st Qu.:44.80   1st Qu.:0.20043  
 3      : 0   3:6   c:1   Median :50.42   Median :0.53025  
 5      : 0         d:3   Mean   :47.73   Mean   :0.71829  
 6      : 0         e:4   3rd Qu.:52.44   3rd Qu.:1.13632  
 7      : 0               Max.   :64.83   Max.   :1.88971  
 (Other): 0 

The output is not looking good on the id field. I think that can be fixed easily.

Feel free to edit the answer in this regard.

Hope it helps!!

David
  • 524
  • 1
  • 7
  • 24