2

I have a CSV file that needs to be read into R, transposed (swapping rows for columns) and then processed.

Here is the form of the file (not that columns actually extend to 2014):

Year,1970,1971,1972
Variable one,1,2,3
Variable two,11,22,33
Variable three,111,222,333

When I read it, the years are prefixed with 'X'

> rc <- read.csv("file.csv")
> rc
            Year X1970 X1971 X1972
1   Variable one     1     2     3
2   Variable two    11    22    33
3 Variable three   111   222   333

and when I transpose the data everything is treated as a string.

> t(rc)
      [,1]           [,2]           [,3]            
Year  "Variable one" "Variable two" "Variable three"
X1970 "  1"          " 11"          "111"           
X1971 "  2"          " 22"          "222"           
X1972 "  3"          " 33"          "333"   

If I delete the names for the rows in the csv file, the dates are still prefixed by X but the transpose does not change the data to strings.

So how do I do this properly so that the years are numeric and transposing does not create strings.

William Morris
  • 3,554
  • 2
  • 23
  • 24
  • Maybe [this answer](http://stackoverflow.com/a/15688406/2204410) will help you out. – Jaap Apr 04 '14 at 17:22

1 Answers1

5

Just add check.names = FALSE with your read.csv statement (but it's not a great idea since you'll end up with syntactically invalid names in this case):

X <- read.csv(text = "Year,1970,1971,1972
 Variable one,1,2,3
 Variable two,11,22,33
 Variable three,111,222,333", check.names = FALSE)
X
#             Year 1970 1971 1972
# 1   Variable one    1    2    3
# 2   Variable two   11   22   33
# 3 Variable three  111  222  333

Regarding transposing your data, drop the string values first, and reintroduce them as the column names later:

tX <- t(X[-1])
colnames(tX) <- X[[1]]
tX
#       Variable one  Variable two  Variable three
# 1970             1            11             111
# 1971             2            22             222
# 1972             3            33             333
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • @WilliamMorris, it did not "disappear". Your "year" variable became the rownames of the `matrix` that resulted when we used `t()`. `row.names` does not have a "name" (other than "row.names" :-)). – A5C1D2H2I1M1N2O1R2T1 Apr 04 '14 at 17:29
  • You said that using check.names = FALSE results in syntactically invalid names. Is there a better way of achieving the final result? – William Morris Apr 04 '14 at 18:12
  • @WilliamMorris, do you want a `data.frame` or a `matrix` as your result? You could look into a `melt` + `dcast` approach. The main thing to remember with syntactically invalid names is that sometimes you'll have to use backticks (\`) or other quotes to refer to them. – A5C1D2H2I1M1N2O1R2T1 Apr 04 '14 at 18:30
  • I really wanted a frame so I ended up saving the transposed data to a csv, editing the file and reloading. Totally wrong approach I'm sure but it got the job done. Thanks for your help :-) – William Morris Apr 04 '14 at 19:30