0

I am having a bit of a problem here importing with read.table (I've also tried read.fwf). The problem is that the first column (which I'm using for row names) uses a "0" in rows "01-09" for a place holder. R ignores this first "0", thus eliminating the place holder and throwing off all my subsets there after. So when I get to any row higher than "9" it becomes "1" again. So essentially, R is reading row "02 and "20" as "2" because the first "0" is gone... I'm sure this is a simple fix, I just can't seem to chase it down. Thanks, nm

nick m
  • 33
  • 6
  • If you have edits, please edit the question, [using this](http://stackoverflow.com/posts/36584890/edit), further your addition is the same as what you had, and they aren't different. I'm not sure what your question is here, check out [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Badger Apr 12 '16 at 22:09
  • Where is your first column? Do you need row names to import or can you add them later? – Badger Apr 12 '16 at 22:12
  • I need them on import, yes. there ~43k rows here so adding them later is not an option. The row names (ex: "0101271895") are a code. "01" is Alabama, "0101" = Alabama 1st weather division. "010127" = Alabama 1st weather division Temp MAX. "0101271895" = Alabama 1st weather division Temp MAX 1895 – nick m Apr 12 '16 at 22:15
  • If they are sequential, adding them later is an option. 43k is tiny. `row.names(x) <-seq(0,length(x)-1)`. You can read them in as a column rather than row names as a character value, that would save them as you have them entered. R is trying to make your row names numeric as they appear to be numeric. Based on your question they need to be stored as characters (according to my limited understanding of row names). – Badger Apr 12 '16 at 22:16
  • Those should be used as a variable then, when you `read.table()` use as.is=T and make sure the row names column has a column name. – Badger Apr 12 '16 at 22:22
  • Did you use as.is=T? – Badger Apr 12 '16 at 22:26
  • Maybe I'm not understanding your direction: This is what I have: read.fwf("~/Documents/climdiv-tmaxdv-v1.0.0-20160304.txt", widths = c(12,7,6,7,7,7,7,7,7,7,7,7,6), header = FALSE, sep = "", n = 43164, as.is = TRUE) -> Tmax – nick m Apr 12 '16 at 22:28
  • Not `read.fwf()`, use `read.table()` – Badger Apr 12 '16 at 22:33

2 Answers2

0

help(read.table)

read.table(file, colClasses = c("character",...), ...)

colClasses tells R how to treat the incoming data. Defining the first element as "character" will keep the leading zeros, and you can define the other columns as well. More details in this question.

Community
  • 1
  • 1
oshun
  • 2,319
  • 18
  • 32
  • Correct me if I'm wrong, shouldn't as.is=T detect the leading zero and define it as a character? – Badger Apr 12 '16 at 22:27
  • That worked! I would never have figured that out from help(). Thank you! – nick m Apr 12 '16 at 22:34
  • I just tried on a dummy Excel CSV file and `as.is=T` did not work. Your suggestions of processing within R constitute a better/more foolproof philosophy in general. For this case though, the first step is getting the data in. (Sidenote: that Excel file automatically drops the leading zeros when I re-open it. Scary. I'm guessing OP is importing from a .txt file). – oshun Apr 12 '16 at 22:40
  • @nickm Yeah, I don't find R help files very helpful :) – oshun Apr 12 '16 at 22:49
  • Arrgh. Try reading Stata help files some time. Talk about obscure. – IRTFM Apr 12 '16 at 23:59
  • Thanks for checking that oshun, I was just heading out of the office, I should've tested before I suggested! I've used `as.is=T` in the past to keep my columns from factorizing, I was hoping it may maintain character as well considering the help file says `as.is` supercedes `colClasses`. – Badger Apr 13 '16 at 00:38
0

I'm wondering if you need to use:

read.fwf("~/Documents/climdiv-tmaxdv-v1.0.0-20160304.txt",
          widths = c(12,7,6,7,7,7,7,7,7,7,7,7,6), header = FALSE, 
          n = 43164, colClasses="character")

I cannot really tell because you are saying column 1 is a rowname except that the read.*-functions would not assume that the first column of a headerless file were rownames. Are all the columns separated by spaces or tabs? Then maybe read.table is the correct answer, but if not, then you will probably need to use read.fwf.

From your comment it appears that this really is a FWF-file and that you should be using widths starting with c(2, 2,2,2,2 ...)

IRTFM
  • 258,963
  • 21
  • 364
  • 487