0

I have a "wide" dataframe that has dates as column headings. Using the transform function, I can turn it into a "long" dataframe. I need the dates however (column headings) to be actual data points as a new variable. Let me illustrate the problem with a similar dataframe:

matrix <- matrix(round(runif(25)*10), nrow = 5, ncol = 5)
hospital_names <- c("Hosp A", "Hosp B", "Hosp C", "Hosp D", "Hosp E")
dates <- c("Jan 03", "Feb 03", "Mar 03", "Apr 03", "May 03")
df <- as.data.frame(matrix)
colnames(df) =  dates
df <- cbind(hospital_names, df)

This gives a dataframe that looks like this:

  hospital_names Jan 03 Feb 03 Mar 03 Apr 03 May 03 
 1 Hosp A          10      3      1      9      2        
 2 Hosp B           8      5      8      6      0          
 3 Hosp C           9      1      9      2      5         
 4 Hosp D           4      0      7      0      4         
 5 Hosp E           0      8      9      2      4

I now apply the transform function:

 df <- t(df)

And get this:

                [,1]     [,2]     [,3]     [,4]     [,5]    
 hospital_names "Hosp A" "Hosp B" "Hosp C" "Hosp D" "Hosp E"
 Jan 03         "10"     "8"      "9"      "4"      "0"    
 Feb 03         "3"      "5"      "1"      "0"      "8"    
 Mar 03         "1"      "8"      "9"      "7"      "9"  
 Apr 03         "9"      "6"      "2"      "0"      "2"  
 May 03         "2"      "0"      "5"      "4"      "4" 

So at this point, my dates are actual row names instead of data points. My question is, is there an easy fix? Can I convert the row names into a vector and then use cbind to add it as a variable?

nrussell
  • 18,382
  • 4
  • 47
  • 60

1 Answers1

0

Welcome to R. Using a package (like Reshape2 as suggested in the comments) is one way to go, but for the level of what you're doing here, I might guess you want a simpler solution from base R.

A couple of points:

  • it's generally a good idea not to use R function names for your variable names (eg. testmat <- matrix(...) instead of matrix <- matrix(...)) otherwise you may overwrite functions/variables.

  • that "t" means the matrix operation transpose, rather than transform and it just swaps your columns and rows.

That said, I think you want something like this.

matx <- matrix(round(runif(25)*10), nrow = 5, ncol = 5)
hospital_names <- c("Hosp A", "Hosp B", "Hosp C", "Hosp D", "Hosp E")
dates <- c("Jan 03", "Feb 03", "Mar 03", "Apr 03", "May 03")
df2 <- data.frame(dates=dates,t(matx))
colnames(df2) <- c("dates",hospital_names)

The output of which is:

   dates Hosp A Hosp B Hosp C Hosp D Hosp E
1 Jan 03      1      7      0      9      7
2 Feb 03      6      5      7      9      8
3 Mar 03     10     10      5      0      9
4 Apr 03      7      2      2      3      4
5 May 03      7      7      2      7      2

What happened was to leave the matrix as a (numerical) matrix, keep the names separate and then when defining the data frame, have the dates as the first column (named "dates", then add the transpose of the matrix (matx)). Finally, get all the right names in the right places by using concatenate to add the title "dates" in front of the other hospital names c("dates",hospital_names).

If you really want your "wide" format to go to a "long" one, that's something else and links are in the comments.

require(reshape2)
melt(df2)

with output

    dates variable value
1  Jan 03   Hosp A     1
2  Feb 03   Hosp A     6
3  Mar 03   Hosp A    10
4  Apr 03   Hosp A     7
5  May 03   Hosp A     7
6  Jan 03   Hosp B     7
7  Feb 03   Hosp B     5
8  Mar 03   Hosp B    10
9  Apr 03   Hosp B     2
10 May 03   Hosp B     7
11 Jan 03   Hosp C     0
12 Feb 03   Hosp C     7
13 Mar 03   Hosp C     5
14 Apr 03   Hosp C     2
15 May 03   Hosp C     2
16 Jan 03   Hosp D     9
17 Feb 03   Hosp D     9
18 Mar 03   Hosp D     0
19 Apr 03   Hosp D     3
20 May 03   Hosp D     7
21 Jan 03   Hosp E     7
22 Feb 03   Hosp E     8
23 Mar 03   Hosp E     9
24 Apr 03   Hosp E     4
25 May 03   Hosp E     2
JasonD
  • 142
  • 2
  • 10
  • Thanks very much indeed - this is extremely helpful. – user7429990 Jan 17 '17 at 13:55
  • One small problem that I have is that with my actual dataset I don't have a vector called "dates" (which I used in this example to create the column names). I just have column names that are set as dates (if that makes sense). Is there a way that I can extract that existing column names and create a vector that I can then use in the way that you described above? again, very appreciative of your help. – user7429990 Jan 17 '17 at 14:04
  • Sure. If your matrix is called "mydata" then colnames(mydata) will be a character vector of the names of each column. You may need to look at it briefly to figure out which are the ones of interest if there are other column headings you don't need (to get dates only for example). You can use your new vector in place of the "dates" one in your example. – JasonD Jan 17 '17 at 14:13