0

So I come from a background of Matlab and Python (and several others less related). I'm picking up R for a Coursera course.

I followed this SO answer in order to read in all my homework files into a list in a single line of code. My code looks like this:

# Get a list of files
files = list.files(path = dataDir, pattern = '*.csv')

# Import the file data
setwd(dataDir)
data = lapply(files, read.csv)

This all works just fine. However, I am getting a object back that I don't know how to access. I mentioned Matlab and Python before because I've attempted to access the data in all the ways I would in those languages.

Here's what summary output:

summary(data)
       Length Class      Mode
  [1,] 4      data.frame list
  [2,] 4      data.frame list
  [3,] 4      data.frame list

There are actually 352 of them not just 3 but no one needs a listing of all 352. Here's what summary of an individual index outputs:

summary(data[200])
     Length Class      Mode
[1,] 4      data.frame list

So if I enter data[200] I get listing of the first 2500 rows of data. But data[200, 100] returns as error as does data[200][,100] and data[200][100,]. data[200][100] returns [[1]] NULL.

While I haven't fully considered what I will need to do for this homework I'm sure it will involve calculating means/medians/maximum/etc of all non-NA values in various data columns. This wasn't tough to do for the quizzes using something like mean(data[which(is.na('Col1')==F), 'Col6']).

So I imagine I could use a more hackish version of what I need where I simply load the 1 file I need at the time I need it, extract only the portion of the data frame I need right then, and loop over all the data files I need to process. However, I'd rather know how to access the data in the object R creates from the lapply line. I suspect this will make more complex analyses later on much easier to code.

Thanks

Community
  • 1
  • 1
Gabe Spradlin
  • 1,937
  • 4
  • 23
  • 47
  • 2
    What happens if you do `data[[200]][100,]`? – C_Z_ Oct 16 '15 at 19:08
  • @CactusWoman That did the trick. Thank you. Can you make that an answer so I can accept it? – Gabe Spradlin Oct 16 '15 at 19:49
  • 1
    @gabe Just wanted to note, that it looks like data is a list of data.frames. You access an item in a list directly by doing: `data[[1]]`, which would access the first item of the list. Then you select in the data frame with the additional bracket `[100, ]` as @CactusWoman says. – giraffehere Oct 16 '15 at 19:50
  • @giraffehere That's exactly the explanation I was hoping someone would provide as to why this works. I knew it was a noob question but get tired of looking for an answer after a couple of hours. Obviously, I wasn't asking the right question. – Gabe Spradlin Oct 16 '15 at 19:55
  • @gabe No worries. To be honest, it took me FAR too long to realize I could access lists with this notation. I really disliked lists beforehand. – giraffehere Oct 16 '15 at 20:01

1 Answers1

3

When you subset, single square brackets [ always return an object of the same class as the object you are subsetting. So, data[200] returns a list of length 1 containing one dataframe because data is a list. Double square brackets [[ give you the actual object contained in the list (in this case, a dataframe). Once you have a dataframe, you can select the first 100 rows with [100,], which is why the following works:

data[[200]][100,]
C_Z_
  • 7,427
  • 5
  • 44
  • 81