5

I have a MATLAB struct, containing a number of fields which together describe, say, 100 observations of a number of variables, as follows (MATLAB output):

mystruct = 

  fieldA: [100x1 double]
  fieldB: [100x1 double]
  fieldC: [100x1 double]
  fieldD: [100x1 char]
  fieldE: {100x1 cell}

I want to use R with this data, so I save the struct as a .mat file. and import it using the R.matlab package. Because I'm new to R, the following is likely clumsy, but I can access individual fields just fine (R code):

> f = readMat('myfile.mat')
> data = f$mystruct
> data
  , , 1

      [,1]         
  fieldA Numeric,100  
  fieldB Numeric,100  
  fieldC Numeric,100  
  fieldD Character,100
  fieldE List,100   

> data = data[, , 1]
> df <- data.frame(fieldA = data$fieldA, fieldB = data$fieldB)

OK, so here is the question: how can I generalize the above so that a data frame is generated for an arbitrary number of fields in the original struct? For my 5-field example I can manually do it, but the next data set I have has many fields, and I don't want to enter them all.

As per this question, I tried rbind() and ldply(), which construct outrageously dimensioned data frames (401 obs of 1 variable and 401 obs of 105 variables respectively).

Community
  • 1
  • 1
Matt Mizumi
  • 1,193
  • 1
  • 11
  • 27
  • It depends on what's in `data` (you could post the output of`str(data)`), but perhaps all you need is `df <- as.data.frame(data)` – Ista Jan 22 '15 at 03:16
  • summary of `data` now included; doing this gives a data frame with 100 obs of 104 variables (should be 100 obs of 5 variables) – Matt Mizumi Jan 22 '15 at 03:35
  • 3
    I suspect FieldE is converted to 100 variables of length 1 instead of 1 variable of length 100. Does ` drop=c("singletonLists")` help? – koekenbakker Jan 22 '15 at 11:47
  • As an alternative, if your structure object isn't too large, have `Matlab` write it to a csv file, read that in, and resize/reshape as desired. – Carl Witthoft Jan 22 '15 at 12:52

1 Answers1

7

As it turns out, the MATLAB cell array (fieldE) was imported as a nested list. Using unlist takes care of the problem:

data = lapply(data, unlist, use.names=FALSE)
df <- as.data.frame(data) # now has correct number of obs and vars

Thanks @koekenbakker for the critical pointer to this!

Matt Mizumi
  • 1,193
  • 1
  • 11
  • 27