3

I am using R to transpose a data frame from an excel file (mydata.xls). The original data frame looks like this:

ID         AB_1  AB_2  AB_3  AB_4  AB_5
Variable1  1     2     3     4     5
Variable2  1.5   2.8   4.5   5.6   7.8

This is what I would like to achieve:

ID    Variable1   Variable2
AB_1  1           1.5
AB_2  2           2.8
AB_3  3           4.5
AB_4  4           5.6
AB_5  5           7.8

Here is what I did, according to the response on a similar post in the past.

Library(XLConnect)
x=loadWorkbook("mydata.xls")
y=readWorksheet(x,"Summary")
z=setNames(data.frame(t(y[,-1])),y[,1])

However, here is what I got:

z    
      Variable1   Variable2
AB_1  1           1.5
AB_2  2           2.8
AB_3  3           4.5
AB_4  4           5.6
AB_5  5           7.8

Two problems are noticed: 1."ID" is missing. 2. when I checked the first column of the new data frame, the second column was returned (see below)

z[,1]
[1] 1 2 3 4 5

I wonder 1) what happened to the "ID" and that entire column? 2) How the issues could be fixed?

Community
  • 1
  • 1
xuan
  • 131
  • 1
  • 3
  • ID isn't a column, it begins as column names, so it becomes row names. It could be turned into a column. – Glen_b Dec 02 '12 at 14:05
  • 1
    To get row names as column you can do z$ID<-rownames(z) – Didzis Elferts Dec 02 '12 at 14:06
  • Not quite clear -- but I think the `AB_*` are the row names, and not a column of data. That's why you're confused. I bet you'll find `z[,3]` does not exist. – Carl Witthoft Dec 02 '12 at 16:31
  • Your object `y` already is a data.frame with `ID` as a column so maybe you'll just be happy with it? Either way, you really ought to read http://cran.r-project.org/doc/manuals/R-intro.pdf. – flodel Dec 02 '12 at 16:40

1 Answers1

2

If you want to overcome the default behavior of t.data.frame in using the col.names to form the row.names and instead create a column with the name "ID" then use the cbind.data.frame function:

cbind( ID=row.names(z), z)
#--------------
       ID Variable1 Variable2
AB_1 AB_1         1       1.5
AB_2 AB_2         2       2.8
AB_3 AB_3         3       4.5
AB_4 AB_4         4       5.6
AB_5 AB_5         5       7.8

There is no name for the row.names column in what is printed on the console, (or in the data.frame structure) so your expectations when coming for the "Excel-world" are not going to be completely satisfiable. Alternatively you could learn to think like a real useR and just use row.names(z).

IRTFM
  • 258,963
  • 21
  • 364
  • 487