0

I would like to import this data set into R: http://www.stat.ufl.edu/~winner/data/retail92.dat

Really appreciate if someone could help me. I tried read.csv, read.table but none of them worked. I don't know how to specific the variables' length like in SAS.

TDo
  • 686
  • 4
  • 9
  • 22
  • This post is what you're looking for http://stackoverflow.com/questions/20806811/reading-a-space-delimited-text-file-where-first-column-also-has-spaces – Tung Nov 10 '16 at 22:04

1 Answers1

2

You can use a function which discerns columns based on a number of characters (fixed width). You will need to trim whitespace, though. You can use function trimws.

xy <- read.fwf("http://www.stat.ufl.edu/~winner/data/retail92.dat",
                  widths = c(38, 6, 8, 8, 8))

> head(xy)
                                      V1    V2    V3     V4     V5
1 Acadia, LA                             3.672 0.882 12.364  3.872
2 Ada, ID                                9.251 1.152 21.384  3.861
3 Adams, CO                              7.489 0.911 16.718  3.507
4 Adams, IN                              7.822 1.216 15.772  2.470
5 Aiken, SC                              6.451 1.032 18.474 19.201
6 Alachua, FL                            8.240 1.052 17.505  3.862
> str(xy)
'data.frame':   845 obs. of  5 variables:
 $ V1: Factor w/ 845 levels "Acadia, LA                            ",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ V2: num  3.67 9.25 7.49 7.82 6.45 ...
 $ V3: num  0.882 1.152 0.911 1.216 1.032 ...
 $ V4: num  12.4 21.4 16.7 15.8 18.5 ...
 $ V5: num  3.87 3.86 3.51 2.47 19.2 ...
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • 1
    I think it should be 37 for the first column, otherwise you take the first digit when the second variable >= 10. Also, `strip.white=TRUE` is an argument you can use at input stage - `read.fwf("http://www.stat.ufl.edu/~winner/data/retail92.dat", widths = c(37, 6, 8, 8, 8), strip.white=TRUE)` – thelatemail Nov 10 '16 at 22:22
  • Dataset format is specified [here](http://www.stat.ufl.edu/~winner/data/retail92.txt) which is 36, 8, 8, 8, 8 – Tung Nov 10 '16 at 22:49
  • If you use `read_fwf` together with `read_empty` from the readr package, `read_empty` will automatically figure out the starting and ending positions of each column and you will not have to try to figure them out by yourself: `mydata <- data.frame(read_fwf(file = "http://www.stat.ufl.edu/~winner/data/retail92.dat", col_positions = fwf_empty(file = "http://www.stat.ufl.edu/~winner/data/retail92.dat")))`. Besides that, `read_fwf` is much faster. See more by `?read_fwf`. – panman Nov 10 '16 at 23:15
  • @thelatemail feel free to edit my answer. – Roman Luštrik Nov 11 '16 at 10:35