2

My raw data is in a text file with no particular delimiters between the values, like so:

101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930

Applying read.table in R creates a data frame with only one variable per row, whereas I would like a data frame with 10 variables per row (one for each of the 10 values). How can I achieve this if there is no delimiter in the text file?

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
user1533277
  • 105
  • 1
  • 3
  • 7
  • This might be helpful: http://stackoverflow.com/questions/17302986/converting-web-site-text-file-into-data-frame-in-r – vamosrafa Nov 30 '13 at 07:20
  • You need to post code if you wnat us to understand why you "only get one variable per row"? – IRTFM Nov 30 '13 at 07:24
  • 1
    Do you want the 12-space-delimited values in your example row to be randomly allocated to 10 values? If that's not the case, you need to be specific about whether there is any rules/patterns in terms of delimiting each row. – jinlong Nov 30 '13 at 07:28

3 Answers3

2

We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.

Create test file

Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")

Run. Read in the file using readLines producing L. Then using gsubfn in the gsubfn package insert the character defined by sep between the fields producing g. Finally read the text in g using read.table to create a data frame:

library(gsubfn)
L <- readLines("myfile.txt")

sep <- ";"  # choose any character not in the file

pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)

read.table(text = g, sep = sep)

Output. The result of the last line is:

   V1    V2 V3 V4      V5                 V6   V7   V8   V9  V10
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Are you sure about there only being ten columns?

> read.table(text="101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930")
   V1    V2 V3 V4      V5     V6   V7     V8   V9  V10  V11   V12
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

The other possibility is that this is a fixed width format file. We would get a better understanding of this possibility if you posted several lines:

require(foreign)
txt2 <- "101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930"
read.fwf(file=textConnection(txt2), c(4,6,3,4,9,20,6,8,8,8))
   V1    V2  V3   V4        V5                   V6   V7   V8   V9   V10
1 101 10.08   S    A   05OCT93  GOLDEN GATE BRIDGE  4110 6548 6404 55930
IRTFM
  • 258,963
  • 21
  • 364
  • 487