0

I have a dataset that looks like this..

MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

where the data for each station begins with a description of the station - code, name, latitude etc.. The first column is the year, the second is the month of the year, the third the number of days and the following values are the daily precipitation values for that month.

There are 860 stations in this single dataset. How do I convert this into the following format in R?

Station Code    Name    Lat Long    Year    Month   Precip
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0
MX000003035 LORETO  26.017  111.333 1941    1   0

.. and so on

EDIT: Here is the link to the dataset https://www.dropbox.com/s/o0yp1pe4rze8amd/gdcn_SWUS.txt

And here are some snippets...

1940 10 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1940 11 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1940 12 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

...

2014  9 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 10 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 11 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 12 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
MX000003068 CIUDAD CONSTITUCION       24.9500 -111.7000   48.0 1957 2014
1957  1 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1957  2 28-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1957  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0  190   80    0    0    0    0    0    0    0    0    0    0    0    0
1957  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  5 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  6 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  7 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1957  8 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0   50    0    0    0    0   50    0   50    0    0    0    0    0    5    0    0    0

...

2014  9 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 10 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 11 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2014 12 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
USC00040983 BORREGO DESERT PARK       33.2314 -116.4144  245.4 1942 2014
1942  1 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  2 28-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  3 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  4 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  5 31-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1942  6 30-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Srijita
  • 1
  • 1
  • are you looking for t(dataset)? – user2510479 Mar 17 '14 at 21:21
  • Is your dataset already in R and, if so, could you provide the result of `dput(yourData)`? – Frank Mar 17 '14 at 21:25
  • @Frank No I have not been able to bring it into R since each row has unequal number of columns. When I try to specify the separator as "\t" it chops off the dataset at the end of the first station. – Srijita Mar 17 '14 at 21:31
  • @Frank edited the post with a link to the dataset – Srijita Mar 17 '14 at 21:40
  • Please provide a **minimal** reproducible example, _not_ a 145.53 MB file. Have a look [**here**](http://stackoverflow.com/help/mcve), [**here**](http://www.sscce.org/), and [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – Henrik Mar 17 '14 at 21:43
  • @Henrik added a few more snippets of the original dataset... hope this helps – Srijita Mar 17 '14 at 21:51
  • Okay, I guess you'll have to do some splitting of that data into chunks for each station, first. Those strung-together 9999's are another problem... you should probably think about cleaning the data first, at least to the point where you have a consistent separator for each cell, like a space. In any case, in asking questions here, you'll need to simplify the problem and ask one thing at a time... while this data introduces several problems/questions at once. – Frank Mar 17 '14 at 21:52

1 Answers1

0

Bring the data into R with readLines

dat <- raedLines( filename )

Then get the row numbers with station names and lat/long:

stations <- dat[ grep( "[[:alpha:]]{2}", dat) ] 

Identify the row numbers of data lines:

breaks <- grep( "[[:alpha:]]{2}", dat)
 breaks
#[1]  1  6 10

Make sequence of breaks:

breaks <- c(breaks, length(dat)+1 )

Then pull in data between breaks and let the R "auto repeat" function duplicate the station data:

newdf <- lapply( seq_along(breaks[-1]), 
             function(idx){ 
        data.frame( stations[idx],
                    read.table(text=dat[(breaks[idx]+1):(breaks[idx+1]-1)], fill=TRUE))})

Then row-bind the rows back together:

  newdf2 <- do.call(rbind, newdf)

Test data:

dat <- readLines( textConnection("MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
MX000003037 Another                    28.0170  113.3330    7.0 1938 2014
1941  1 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  2 28    0    0    0   10    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  3 31    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1941  4 30    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0")

Output: (not quite done but if you pass stations to read.table first and then do the lapply(cbind(..)) operation it should work fine:

 ##   stations <- read.table(text=stations)
 # Remove column 7 and add desired row names

> newdf               ### the unfinished version
                                                               stations.idx.   V1
1   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
2   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
3   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
4   MX000003035 LORETO                    26.0170  111.3330    7.0 1938 2014 1941
5   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
6   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
7   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
8   MX000003036 Laredo                    27.0170  112.3330    7.0 1938 2014 1941
9  MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
10 MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
11 MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
12 MX000003037 Another                    28.0170  113.3330    7.0 1938 2014 1941
   V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23
1   1 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
2   2 28  0  0  0 10  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3   3 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4   4 30  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
5   1 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
6   2 28  0  0  0 10  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
7   3 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
8   4 30  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
9   1 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
10  2 28  0  0  0 10  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
11  3 31  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
12  4 30  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34
1    0   0   0   0   0   0   0   0   0   0   0
2    0   0   0   0   0   0   0   0  NA  NA  NA
3    0   0   0   0   0   0   0   0   0   0   0
4    0   0   0   0   0   0   0   0   0   0  NA
5    0   0   0   0   0   0   0   0   0   0   0
6    0   0   0   0   0   0   0   0  NA  NA  NA
7    0   0   0   0   0   0   0   0   0   0   0
8    0   0   0   0   0   0   0   0   0   0  NA
9    0   0   0   0   0   0   0   0   0   0   0
10   0   0   0   0   0   0   0   0  NA  NA  NA
11   0   0   0   0   0   0   0   0   0   0   0
12   0   0   0   0   0   0   0   0   0   0  NA
IRTFM
  • 258,963
  • 21
  • 364
  • 487