0

I am trying to import a data set from a .dat file off the internet using the read.table command. I usually have no issues when the file is formatted, for example:

A B
1 2
3 4

But this data set is formatted

A B A B
1 2 3 4
5 6 7 8

(You can find the data set I'm having issues with here: https://www2.isye.gatech.edu/~jeffwu/book/data/BrainandBodyWeight.dat)

My current line of code is:

Data2 = read.table("https://www2.isye.gatech.edu/~jeffwu/book/data/BrainandBodyWeight.dat", header = TRUE)

The error I'm getting is:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 12 elements

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
  • If you are just trying to read as is then use fill=TRUE parameter in read.table, `read.table("https://www2.isye.gatech.edu/~jeffwu/book/data/BrainandBodyWeight.dat", header = TRUE, fill = TRUE)`, since the later columns don't contain any values, it will be populated with NAs – PKumar Feb 04 '21 at 02:46
  • Does this answer your question? [Issue when importing dataset: \`Error in scan(...): line 1 did not have 145 elements\`](https://stackoverflow.com/questions/18161009/issue-when-importing-dataset-error-in-scan-line-1-did-not-have-145-eleme) – AlSub Feb 04 '21 at 02:47
  • @PKumar that worked to get rid of the error but I need just two columns. The data set is just two column names but is displayed as 6 different columns with each name showing up twice. I am trying to get all the data into the appropriate column – Michael Visconti Feb 04 '21 at 02:59

1 Answers1

1

The problem is there are spaces in the header row, so just skip that with skip = 1.

From there, we can extract the even and odd rows using a repeating logical vector c(TRUE, FALSE) and c(FALSE, TRUE).

The final line of the data has some empty values, so remove those with complete.cases().

data <- read.table("https://www2.isye.gatech.edu/~jeffwu/book/data/BrainandBodyWeight.dat",
                   header = FALSE, fill = TRUE, skip = 1)

result <- data.frame(Body.Wt = unname(unlist(data[,c(T,F)])),
                     Brain.Wt = unname(unlist(data[,c(F,T)])))

result <- result[complete.cases(result),]
head(result)
  Body.Wt Brain.Wt
1   3.385     44.5
2   0.480     15.5
3   1.350      8.1
4 465.000    423.0
5  36.330    119.5
6  27.660    115.0
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57