1

Please forgive me for how basic this question must be, but I cannot, for the life of me, coerce my dataset into a data frame. I'm new to R but have worked in other languages (VBA and Matlab).

My data are pulling into R ds <- read_excel("Sample Data.xlsx") as a list, checked with typeof(ds). I tried to coerce the list into a data frame using df <- as.data.frame(ds) but that doesn't work either. The sample dataset is simple (4 variables with 5 observations each) and is stored on an Excel spreadsheet. I'm working in RStudio and the only package I have loaded is readxl.

I've asked colleagues and searched quite a bit, but it may be that my question isn't phrased properly.

Edit In response to comments, I checked the class of both df and ds. class(df) returns "data.frame" and class(ds) returns "tbl_df "tbl" "data.frame.

However, even df is still behaving as a list. typeof(df[1]) returns "list", while typeof(df[[1]]) returns "double", as it should. Functions I need to use aren't working because of this.

cor.test(df[1], df[2]) # returns Error in cor.test.default(df[1], df[2]) : 'x' must be a numeric vector

However, the code below gives me what I need.

cor.test(df[[1]], df[[2]]) # returns an r = .29, among other stats
Kyle S
  • 13
  • 4
  • 2
    Objects of *class* `"data.frame"` are of *typeof* `"list"`. Try `class(ds)`. – Rui Barradas Jun 16 '20 at 19:38
  • 2
    `read_excel` from the `readxl` package, by default returns a `tibble`, if you are pulling in more than one sheet, you will get a list. By default, it should return only the first. You could try `sheet = 1`. Failing that, please provide us some of the output of `dput(ds)` as an [edit] to your question. – Ian Campbell Jun 16 '20 at 19:38
  • Also [relevant](https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el). – Rui Barradas Jun 16 '20 at 20:57
  • Thank you for that reference, @RuiBarradas – Kyle S Jun 16 '20 at 21:14
  • ...and since we're on the topic of the extract operator, [this](https://stackoverflow.com/questions/42560090/what-is-the-meaning-of-the-dollar-sign-in-r-function/47497773#47497773) is also relevant. – Len Greski Jun 16 '20 at 21:15
  • Thank you, @LenGreski – Kyle S Jun 16 '20 at 21:28

2 Answers2

2

I think you get a data frame correctly. Function read_excel() from package readxl should return a tibble, which is a special type of data frame. (And if you don't provide a sheet name, it takes only the first sheet and returns a tibble as well.)

Tibble is of type list, similar to data frames. Check this on built-in data frame mtcars:

typeof(mtcars)

To get a class of your object, type class(ds) and you'll see it's a data frame and a tibble. So you should be able to work with it as with data frame, don't worry.

To refer to its rows or columns, simply type df[rows, columns] so for your case:

cor.test(df[ ,1], df[ ,2])
  • Thank you for your comment! However, please see my edit. I just can't get it to work the way it should. – Kyle S Jun 16 '20 at 19:55
  • @KyleS Yeah, you probably want to refer to columns, so type `cor.test(df[ ,1], df[ ,2])` as the first (omitted) number in square brackets is rows and the second is columns. –  Jun 16 '20 at 19:59
  • 2
    Thank you, Petr, that was exactly the issue. A colleague's code sent to me had dataset[1] and that worked just fine on his computer. Not quite sure what he did, but this works just fine. Thank you for your patience! – Kyle S Jun 16 '20 at 20:03
2

The problem listed in the question is due to the differences in behavior across the [ and [[ forms of the extract operator.

The [ form of the extract operator when used on a data frame returns another data frame, which is also a list.

str(mtcars[1])
'data.frame':   32 obs. of  1 variable:
 $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

The [[ form of the extract operator returns a vector.

str(mtcars[[1]])
 num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

Since base::cor.test() takes vectors as input, one must use the [[ form of the extract operator, the data frame[,col] version of the [ operator, or the $ form. For example:

cor.test(mtcars[,1],mtcars[,4])
cor.test(mtcars[[1]],mtcars[[4]])
cor.test(mtcars$mpg,mtcars$hp)

...all of which return the same result:

> cor.test(mtcars$mpg,mtcars$hp)

    Pearson's product-moment correlation

data:  mtcars$mpg and mtcars$hp
t = -6.7424, df = 30, p-value = 1.788e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8852686 -0.5860994
sample estimates:
       cor 
-0.7761684

NOTE: some R functions can handle inputs of data frames instead of vectors, such as psych::corr.test().

> psych::corr.test(mtcars[1],mtcars[4])
Call:psych::corr.test(x = mtcars[1], y = mtcars[4])
Correlation matrix 
       hp
mpg -0.78
Sample Size 
[1] 32
Probability values  adjusted for multiple tests. 
    hp
mpg  0

 To see confidence intervals of the correlations, print with the short=FALSE option
Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • Thank you so much for your answer. I need to understand more of the ins and outs of R, so this helps as well. – Kyle S Jun 16 '20 at 20:32
  • @KyleS - you're welcome. The extract operator is an extremely important component of R, and many beginning R users struggle with it. For more information on this topic, you can check out, [Forms of the Extract Operator](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-extractOperator.md), which I wrote to support the Johns Hopkins University R programming course on Coursera. – Len Greski Jun 16 '20 at 21:12