3

I have a text file to read in R, but the file does not seem to be tab-delimited. The only structure of the file is that columns always finish at some point (i.e. columns are right aligned).

So, first, is there a name for this type of data structure? Then, how can read it in R?

    2.37      2.03                          2.38
   5,397     5,082                         5,609
    13.0      21.6          15.2            15.2
   128.0     103.1         134.2           133.4

Just using read.table() doesn't work, the missing value won't be put at the right place...

# download data:
tmp <- tempfile()
f <- download.file("http://usda.mannlib.cornell.edu/usda/waob/wasde//1990s/1995/wasde-01-12-1995.txt", tmp)
D <- file(tmp)
data_enc <- readLines(D, warn=FALSE)
close(D)
dat <- sapply(strsplit(data_enc[232:236], ":"), function(x) x[2])
writeLines(dat, tmp)

## try to read data:
read.table(tmp, fill = TRUE, sep ="", header=FALSE)

Gives:

      V1    V2    V3    V4
 1  2.37  2.03  2.38    NA
 2 5,397 5,082 5,609    NA
 3  13.0  21.6  15.2  15.2
zx8754
  • 52,746
  • 12
  • 114
  • 209
Matifou
  • 7,968
  • 3
  • 47
  • 52

1 Answers1

2

Maybe try using read.fwf to read a table of fixed width formatted data:

widths <- gregexpr("\\.\\d", readLines(tmp)[5])[[1]]+1L # line 5 looks complete
widths <- c(widths[1], diff(widths)) # posis after the decimal points as widths
read.fwf(tmp, widths = widths)
#         V1         V2    V3               V4
# 1     2.37       2.03    NA             2.38
# 2    5,397      5,082    NA            5,609
# 3     13.0       21.6  15.2             15.2
# 4    128.0      103.1 134.2            133.4
# 5    146.4      130.9 156.5            155.7
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • 1
    oh nice! So is this data still a fixed-width data? I was tricked by thinking that fixed width involved not only same end but also same start!? Then maybe Hadley's package readr will make it very easy: read_fwf(tmp, fwf_empty(tmp)) – Matifou Jul 13 '16 at 08:52
  • ... yep, even better. :-) – lukeA Jul 13 '16 at 10:38