I have different dataframes with a column in which there are the latitudes (latitude) of some records and in another column of the same dataframe the date of the records (datecollected). I would like to count and export in a new dataframe the number of the records in the same intervals of latitude (5 degrees) and year (two years).
Asked
Active
Viewed 86 times
1 Answers
0
(Hint: you'll make it easier for us to answer by providing some sample data.)
dataset <- data.frame(datecollected=
sample(as.Date("2000-01-01")+(0:3650),1000,replace=TRUE),
latitude=90*runif(1000))
We round the datecollected
down to the next even year:
year.index <- (as.POSIXlt(dataset$datecollected)$year %/% 2)*2+1900
Similarly, we round the latitude
down to the nearest multiple of 5 degrees:
latitude.index <- (floor(dataset$latitude) %/% 5)*5
Then we simply build a table
on the rounded years and latitudes:
table(year.index,latitude.index)
latitude.index
year.index 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
2000 12 9 15 7 11 10 11 14 9 13 11 10 8 11 13 25 10 18
2002 11 9 11 16 11 15 12 5 12 13 7 15 8 7 11 7 10 13
2004 8 12 9 10 12 16 12 13 9 7 16 11 6 13 4 15 12 10
2006 14 8 13 10 12 9 12 9 6 11 11 9 13 9 10 5 5 12
2008 8 12 17 12 12 8 12 8 14 12 11 11 10 10 14 16 17 13
EDIT: after a bit of discussion in the comments, I'll post my current script. It seems like there may be an issue when you read the data into R. This is what I do and what I get:
rm(list=ls())
dataset <- read.csv("GADUS.csv",header=TRUE,sep=",")
year.index <- (as.POSIXlt(as.character(dataset$datecollected),format="%Y-%m-%d")$year
%/% 2)*2+1900
latitude.index <- (floor(dataset$latitude) %/% 5)*5
table(year.index,latitude.index)
latitude.index
year.index 0 5 20 35 40 45 50 55 60 65 70 75
1752 0 0 0 0 0 20 0 0 0 0 0 0
1754 0 0 0 0 0 27 0 3 0 0 0 0
1756 0 0 0 0 0 21 0 1 0 0 0 0
1758 0 0 0 0 0 46 0 2 0 0 0 0
...
Does this give the same result for you? If not, please edit your question and post the result of str(dataset[,c("datecollected","latitude")])
.

Stephan Kolassa
- 7,953
- 2
- 28
- 48
-
Great, It seems to work but I do not understand why date begins in 2000 and ends in 2008, even if my records begin in 1800 and end in the 2000s. In other words, why there're only 5 years? – Marco Apr 02 '14 at 15:01
-
Well, *my* data is between 2000 and 2010, because I just created toy data that way. If I create toy data between 1800 and 2000, everything looks good. Do you have problems when you apply this approach to your own data? – Stephan Kolassa Apr 02 '14 at 15:05
-
Yes, also in my case the range is 2000-2008 even in the real interval is much bigger. Maybe I should change as.Date starting point in 1800? If you want to check, here you can find the dataframe. Thank you very much. https://dl.dropboxusercontent.com/u/41172284/GADUS.csv – Marco Apr 02 '14 at 15:30
-
Thx. If you read your data using `read.csv()` without specifying the `colClasses` argument, R believes that `datecollected` is a factor, which has funny consequences (http://stackoverflow.com/questions/22811641/changing-factors-to-integers-without-changing-the-order-of-the-data). Simply change the `year.index` line above to `year.index <- (as.POSIXlt(as.character(dataset$datecollected),format="%Y-%m-%d")$year %/% 2)*2+1900`. I'd rather not edit the answer since I'd need to edit my toy data, too, and then it would be harder to understand for the next person coming here via Google. – Stephan Kolassa Apr 02 '14 at 15:46
-
It works! You save my day. Much obliged. I did not understand why before not, but with this new script all works. – Marco Apr 02 '14 at 16:22