0

I'm learning R and currently looking at ISLR's College.csv dataset (found here). I'm trying to set the first data column as the row names but none of the solutions I've found have worked:

college <- fread("College.csv") rownames(college) <- college$V1 college <- college[, -1] college

college <- fread("College.csv") rownames(college) <- college[[1]] college <- college[, -1] college

college <- fread("College.csv") rownames(college) <- college[,1] college[,1] <- NULL college

college <- fread("College.csv") rownames(college) <- college[,1] college <- college(, -1) college

I've found a ton of advice on this issue on StackExchange, on other sites, and in the book I'm using and am confused about why none of it is working for me. I'd welcome any advice.

edit for more detail: I'd like to do this using fread or at least read_csv, and I'd like to do it without reassigning. If that can't be done without reassigning, I'd like to be explicitly told so because I don't trust myself in this matter.

JohnDoeVsJoeSchmoe
  • 671
  • 2
  • 8
  • 25
  • you can see duplicate: https://stackoverflow.com/questions/5555408/convert-the-values-in-a-column-into-row-names-in-an-existing-data-frame-in-r – YOLO Jan 03 '19 at 22:34
  • I saw that before posting. I'd like to do it without reassigning and his non-reassigning method is the third one of the four attempts I listed. – JohnDoeVsJoeSchmoe Jan 03 '19 at 22:37
  • I just went and tried his reassigning method for good measure, and got a `invalid row.names length` error. – JohnDoeVsJoeSchmoe Jan 03 '19 at 22:42

3 Answers3

2

Import the csv file using:

college <- read.csv("path/to/file/College.csv", header = TRUE, row.names = 1)
hmhensen
  • 2,974
  • 3
  • 22
  • 43
bob1
  • 398
  • 3
  • 12
  • Thanks. Can you do it using fread or at least read_csv? I've heard that read.csv is much slower than the other two. – JohnDoeVsJoeSchmoe Jan 03 '19 at 22:31
  • 1
    Apparently not: `readr` (the parent of `read_csv`) produces a `tibble`, which doesn't support row names, but are faster and more memory efficient than data frames. `fread` also does not support row names as it produces a `data.table`. You could perhaps read in using `data.frame(fread("college.csv), row.names=1)`, but I have no idea if that would be efficient or not. – bob1 Jan 03 '19 at 22:53
  • Thank you so much for the detailed follow-up! So should I in general be using `tibble`s or `data.table`s instead of `data.frame`s? Are they best practice? – JohnDoeVsJoeSchmoe Jan 03 '19 at 22:59
1

fread is part of the data.table package. When you import, it does so as a data.table. The reason you can't assign row names is that data.tables cannot have row names. It's an attribute of the package. See https://cran.r-project.org/web/packages/data.table/data.table.pdf.

Try using base or dplyr and you shouldn't have any trouble.

Also, see Display row names in a data.table object.

hmhensen
  • 2,974
  • 3
  • 22
  • 43
  • Oh this is very interesting, thank you. So are `data.frame`s in general a bad design choice? Should I be using `data.table`s for most use cases? – JohnDoeVsJoeSchmoe Jan 03 '19 at 22:58
  • 1
    @JohnDoeVsJoeSchmoe I prefer tidyverse but data.table is not at all a bad choice. In fact, it is the fastest implementation of R, beating both base and tidyverse under most conditions. It's just a matter of preference. – hmhensen Jan 03 '19 at 23:07
0

You can use this

college<- read.csv("C:/Users/USER/Downloads/College.csv")
rownames(college) <- college[,1]

Or this upon import

college<- read.csv("C:/Users/USER/Downloads/College.csv", header = TRUE, row.names = 1)
hmhensen
  • 2,974
  • 3
  • 22
  • 43
mokdur
  • 3
  • 3