0

I have different dataframes with a column in which there are the latitudes (latitude) of some records and in another column of the same dataframe the date of the records (datecollected). I would like to count and export in a new dataframe the number of the records in the same intervals of latitude (5 degrees) and year (two years).

Cœur
  • 37,241
  • 25
  • 195
  • 267
Marco
  • 39
  • 7

1 Answers1

0

(Hint: you'll make it easier for us to answer by providing some sample data.)

dataset <- data.frame(datecollected=
  sample(as.Date("2000-01-01")+(0:3650),1000,replace=TRUE),
latitude=90*runif(1000))

We round the datecollected down to the next even year:

year.index <- (as.POSIXlt(dataset$datecollected)$year %/% 2)*2+1900

Similarly, we round the latitude down to the nearest multiple of 5 degrees:

latitude.index <- (floor(dataset$latitude) %/% 5)*5

Then we simply build a table on the rounded years and latitudes:

table(year.index,latitude.index)

          latitude.index
year.index  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
      2000 12  9 15  7 11 10 11 14  9 13 11 10  8 11 13 25 10 18
      2002 11  9 11 16 11 15 12  5 12 13  7 15  8  7 11  7 10 13
      2004  8 12  9 10 12 16 12 13  9  7 16 11  6 13  4 15 12 10
      2006 14  8 13 10 12  9 12  9  6 11 11  9 13  9 10  5  5 12
      2008  8 12 17 12 12  8 12  8 14 12 11 11 10 10 14 16 17 13

EDIT: after a bit of discussion in the comments, I'll post my current script. It seems like there may be an issue when you read the data into R. This is what I do and what I get:

rm(list=ls())
dataset <- read.csv("GADUS.csv",header=TRUE,sep=",")
year.index <- (as.POSIXlt(as.character(dataset$datecollected),format="%Y-%m-%d")$year
  %/% 2)*2+1900
latitude.index <- (floor(dataset$latitude) %/% 5)*5
table(year.index,latitude.index)

          latitude.index
year.index     0     5    20    35    40    45    50    55    60    65    70    75
      1752     0     0     0     0     0    20     0     0     0     0     0     0
      1754     0     0     0     0     0    27     0     3     0     0     0     0
      1756     0     0     0     0     0    21     0     1     0     0     0     0
      1758     0     0     0     0     0    46     0     2     0     0     0     0
...

Does this give the same result for you? If not, please edit your question and post the result of str(dataset[,c("datecollected","latitude")]).

Stephan Kolassa
  • 7,953
  • 2
  • 28
  • 48
  • Great, It seems to work but I do not understand why date begins in 2000 and ends in 2008, even if my records begin in 1800 and end in the 2000s. In other words, why there're only 5 years? – Marco Apr 02 '14 at 15:01
  • Well, *my* data is between 2000 and 2010, because I just created toy data that way. If I create toy data between 1800 and 2000, everything looks good. Do you have problems when you apply this approach to your own data? – Stephan Kolassa Apr 02 '14 at 15:05
  • Yes, also in my case the range is 2000-2008 even in the real interval is much bigger. Maybe I should change as.Date starting point in 1800? If you want to check, here you can find the dataframe. Thank you very much. https://dl.dropboxusercontent.com/u/41172284/GADUS.csv – Marco Apr 02 '14 at 15:30
  • Thx. If you read your data using `read.csv()` without specifying the `colClasses` argument, R believes that `datecollected` is a factor, which has funny consequences (http://stackoverflow.com/questions/22811641/changing-factors-to-integers-without-changing-the-order-of-the-data). Simply change the `year.index` line above to `year.index <- (as.POSIXlt(as.character(dataset$datecollected),format="%Y-%m-%d")$year %/% 2)*2+1900`. I'd rather not edit the answer since I'd need to edit my toy data, too, and then it would be harder to understand for the next person coming here via Google. – Stephan Kolassa Apr 02 '14 at 15:46
  • It works! You save my day. Much obliged. I did not understand why before not, but with this new script all works. – Marco Apr 02 '14 at 16:22