import dat file into R

Question

Apologies in advance for the simplicity of this question. I am trying to import a .dat file from a website into R with the following code:

www = "http://www.nilu.no/projects/ccc/onlinedata/ozone/CZ03_2009.dat"
data <- read.delim(www, header = TRUE, sep="\t")

I want to access the Value portion of the data.frame, however, I am unsure about the dimensions of the data.frame, if I type ncol(data) it returns 1 which I was expecting three. How do I access the "third" column of this data.frame?

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2012-07-26T07:16:24.807

73

The dat file has some lines of extra information before the actual data. Skip them with the skip argument:

read.table("http://www.nilu.no/projects/ccc/onlinedata/ozone/CZ03_2009.dat", 
           header=TRUE, skip=3)

An easy way to check this if you are unfamiliar with the dataset is to first use readLines to check a few lines, as below:

readLines("http://www.nilu.no/projects/ccc/onlinedata/ozone/CZ03_2009.dat", 
          n=10)
# [1] "Ozone data from CZ03 2009"   "Local time: GMT + 0"        
# [3] ""                            "Date        Hour      Value"
# [5] "01.01.2009 00:00       34.3" "01.01.2009 01:00       31.9"
# [7] "01.01.2009 02:00       29.9" "01.01.2009 03:00       28.5"
# [9] "01.01.2009 04:00       32.9" "01.01.2009 05:00       20.5"

Here, we can see that the actual data starts at [4], so we know to skip the first three lines.

Update

If you really only wanted the Value column, you could do that by:

as.vector(
    read.table("http://www.nilu.no/projects/ccc/onlinedata/ozone/CZ03_2009.dat",
               header=TRUE, skip=3)$Value)

Again, readLines is useful for helping us figure out the actual name of the columns we will be importing.

But I don't see much advantage to doing that over reading the whole dataset in and extracting later.

edited Jul 26 '12 at 07:16

answered Jul 26 '12 at 07:05

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

thank you. So, from this how would I define a variable called 'Value'. data$Value doesnt work, and ncol(data) is equal 1? I type as.vector(data$Value) and R returns NULL. – KatyB Jul 26 '12 at 07:29
1

Please look at the example again. You probably *still* have `sep="\t"`, which will put everything into a single column `data.frame`; the actual file you are trying to read is separated by *space*, not by *tabs*. So, if you want the full dataset, use the solution in the upper part of my answer. If you want just the `Values` column as a separate vector, use the part after the update. I hope this makes sense. – A5C1D2H2I1M1N2O1R2T1 Jul 26 '12 at 07:36
Thank you I missed that part of the solution. Works great. – KatyB Jul 26 '12 at 07:38
why do you have skip equal to 3? – Mona Jalal Sep 07 '16 at 16:40
@MonaJalal, see the first sentence in the answer. – A5C1D2H2I1M1N2O1R2T1 Sep 09 '16 at 17:54
Note: I tried loading a .dat file (not the same as OP) with `read.table` and with `readLines` and ended up with different number of observations (rows). `read.table` only read 20305 rows out of the original data, while `readLines` read all 23308 rows (as expected). I'm still unaware of the cause, but that's the fact. – Marina Dec 28 '18 at 14:02

import dat file into R

1 Answers1

Update

Linked