0

I have a large matrix with a column of datetime information.

The structure is: '2022-01-01_0545'(no seconds).

I'm looking for an effective way to get the number of occurrences of each year, month and day in my matrix. What could be a good approach?

Could I use the table() function or is it better to split the string into its components?

Example:

dates <- c("2022-01-01_0545","2022-03-01_0810","2021-12-13_1003","2022-09-10_0400","2022-03-09_1802")
data <- matrix(0 , nrow = 5, ncol = 2)
data <- cbind(data, dates)
colnames(data) <- NULL

Another example using dput(data[1:5, 10:11):

structure(c("290.603961274028", "281.885433495045", "283.438215255737", 
"275.935544893146", "284.739524498582", "2018-01-01_0000", "2018-01-03_0445", 
"2018-03-07_0045", "2018-01-04_0700", "2018-02-09_0015"), dim = c(5L, 
2L))
s28
  • 173
  • 5
  • 2
    Are you sure you have a `matrix` and not a `data.frame`? Please share a small reproducible example - just a few rows. Using `dput()` is the nicest way to do that because it is copy/pasteable and includes the relevant data structure information and classes, for example `dput(your_data[1:5, 2:3])` for the first 5 rows of columns 2 and 3. – Gregor Thomas May 09 '23 at 15:51
  • 1
    You won't be able to use `table()` on components of a string without splitting it first. Treating it as a string and splitting it is one option, `substr()` will probably be fastest if the character positions are nicely consistent. Another option is converting it to a proper `Date` class object and using functions made for extracting parts of dates (`format.Date` or the `lubridate` package's helper functions `year()`, `month()`, and `day()`). The Date option might offer you nice flexibility for subsequent steps. – Gregor Thomas May 09 '23 at 15:54
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 09 '23 at 15:56
  • 2
    What would "large" be in numbers (thousands, tens of millions, ..) and have you already tried something that failed performance-wise (e.g. lubridate )? – margusl May 09 '23 at 15:59
  • I added an example! "Large" means ~ 100000 x 20 – s28 May 09 '23 at 16:05
  • And I added another example using dput with two columns. – s28 May 09 '23 at 16:17
  • 1
    I'd note that a `matrix` is a bad choice for your data. From your `dput()`, you can see that your numbers are **not** being treated as numerics - everything is strings. A data frame would be a better choice so you can have different classes in different columns. – Gregor Thomas May 09 '23 at 17:20

2 Answers2

2

Normally is good to convert dates into date formats, instead of keeping them as strings. If you do that, you can use lubridate's year, month, and day for a very clean solution.

To convert your dates, use something like:

data$dates = as.POSIXct(data$dates, format = "%Y-%m-%d_%H%M")

Then, you can use a mapping function to apply the lubridate funcions all at once. I used purrr's map, but you can use lapply, amongst others.

library(lubridate)
purrr::map(list("Years" = year, "Months" = month, "Days" = day), ~ table(.x(data$dates)))

Dummy data:

data = data.frame(dates = seq(as.POSIXct("2000-01-01 00:00"), by = 60*60*8, length.out = 12000),
                  numbers = rnorm(12000))

Results:

$Years

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 
1098 1095 1095 1095 1098 1095 1095 1095 1098 1095 1041 

$Months

   1    2    3    4    5    6    7    8    9   10   11   12 
1023  944 1023  990 1023  990 1023 1023  990 1015  987  969 

$Days

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31 
396 395 395 396 395 396 396 395 396 396 396 396 396 393 394 394 393 393 392 394 393 393 393 394 393 394 393 393 369 360 228
1

Calling your matrix m, here's using substr to extract the components and table() to make tables.

years = substr(m[, 2], 1, 4)
months = substr(m[, 2], 6, 7)
days = substr(m[, 2], 9, 10)

table(years)
# years
# 2018 
#    5 
   
table(months)
# months
# 01 02 03 
#  3  1  1 
 
table(days)
# days
# 01 03 04 07 09 
#  1  1  1  1  1 
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294