0

I am trying to import a csv file into R in which rows are formatted as follows:

012002001   14.852DL 0000000034630100304660  
012002001   84.031GG 0000001049180100304660     
012002001   84.033DO 0000001297780100304660     
012002001   84.042GG 0000000780000100304660  

Unfortunately, the spaces do not correspond the fields. The fields are described below.

Field Data Element                          Positions   Length

1   GU Code - State                         1-2         2
    GU Code - Type                          3           1
    GU Code - County                        4-6         3
    GU Code - Place                         7-9         3
    GU Code - Split                         10-12       3

2   Program ID Code                         13-18       6

3   Object Code                             19-20       2

4   Funding Sign                            21          1

5   Funding Amount                          22-33       12

6   FIPS Code - State                       34-35       2
    FIPS Code - County                      36-38       3
    FIPS Code - Place                       39-43       5
    Pass-Through Flag                       44          1

7   Agency Code                             45-48       4

How can I import this file into R in a way that the variables are properly represented?

Thanks!

B.Tot
  • 23
  • 5
  • 2
    Alternatively, open your file in Excel as fixed width, set the boundaries that correspond to the field boundaries, then save as CSV and you can read in all the fields. As long as your file is not too long this is a quick, dirty, and effective way to handle it. – mysteRious Jun 24 '18 at 03:32
  • You need `read.fwf` – IRTFM Jun 24 '18 at 05:46

2 Answers2

6

This is not a CSV file. CSV is a "comma separated values" file. What you're dealing with is a fixed-width file format. See the answer provided here: Read fixed width text file

library(readr)

x <- read_fwf(
  file="http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for",   
  skip=4,
  fwf_widths(c(12, 7, 4, 9, 4, 9, 4, 9, 4)))
bzier
  • 445
  • 3
  • 11
0

Or, here's another way to do it in base R. Define start positions for each variable:

test <- c("012002001   14.852DL 0000000034630100304660  ")
start.pos <- c(1,3,4,7,10,13,19,21,22,34,36,39,44,45)
end.pos <- lead(start.pos)-1
z <- 0

Take substrings for each start.pos:

for (i in 1:length(start.pos)) { 
    z[i] <- substr(test,start.pos[i],end.pos[i])
}

Now all your variables are in a list z:

> z
 [1] "01"           "2"            "002"          "001"          "   "          "14.852"      
 [7] "DL"           " "            "000000003463" "01"           "003"          "04660"       
[13] " "            NA 

> z[3]
[1] "002" 
mysteRious
  • 4,102
  • 2
  • 16
  • 36