-1

I'm working on an R project using a table of Covid-19 cases (columns are dates, files are countries) and I want to find the index of the first column in which the number of cases is non zero for a specific country.

The table is like this: Table showing cases of Covid-19

I know how to index the specific country: For example i_arg <- which(rownames(datos) == "Argentina") and a specific date. But not a date that fulfils a specific condition based on the value it takes on a specific row and the rows before (as I want the index of the first column whose value for a specific row is the first one that isn't 0 in that specific row).

I know this has to be a silly question, but I couldn't find information on how to do this anywhere.

Here is the data for the first few countries as MrFlick suggested:

"1/22/20" "1/23/20" "1/24/20" "1/25/20" "1/26/20" "1/27/20" "1/28/20" "1/29/20" "1/30/20" "1/31/20" "2/1/20" "2/2/20" "2/3/20" "2/4/20" "2/5/20" "2/6/20" "2/7/20" "2/8/20" "2/9/20" "2/10/20" "2/11/20" "2/12/20" "2/13/20" "2/14/20" "2/15/20" "2/16/20" "2/17/20" "2/18/20" "2/19/20" "2/20/20" "2/21/20" "2/22/20" "2/23/20" "2/24/20" "2/25/20" "2/26/20" "2/27/20" "2/28/20" "2/29/20" "3/1/20" "3/2/20" "3/3/20" "3/4/20" "3/5/20" "3/6/20" "3/7/20" "3/8/20" "3/9/20" "3/10/20" "3/11/20" "3/12/20" "3/13/20" "3/14/20" "3/15/20" "3/16/20" "3/17/20" "3/18/20" "3/19/20" "3/20/20" "3/21/20" "3/22/20" "3/23/20" "3/24/20" "3/25/20" "3/26/20" "3/27/20" "3/28/20" "3/29/20" "3/30/20" "3/31/20" "4/1/20" "4/2/20" "4/3/20" "4/4/20" "4/5/20" "4/6/20" "4/7/20" "4/8/20" "4/9/20" "4/10/20" "4/11/20" "4/12/20"

"Afghanistan" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 5 7 7 7 11 16 21 22 22 22 24 24 40 40 74 84 94 110 110 120 170 174 237 273 281 299 349 367 423 444 484 521 555 607

"Albania" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 10 12 23 33 38 42 51 55 59 64 70 76 89 104 123 146 174 186 197 212 223 243 259 277 304 333 361 377 383 400 409 416 433 446

"Algeria" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 3 5 12 12 17 17 19 20 20 20 24 26 37 48 54 60 74 87 90 139 201 230 264 302 367 409 454 511 584 716 847 986 1171 1251 1320 1423 1468 1572 1666 1761 1825 1914

"Andorra" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 39 39 53 75 88 113 133 164 188 224 267 308 334 370 376 390 428 439 466 501 525 545 564 583 601 601 638

"Angola" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 3 3 4 4 5 7 7 7 8 8 8 10 14 16 17 19 19 19 19 19 "Antigua and Barbuda" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 3 3 3 7 7 7 7 7 7 7 9 15 15 15 15 19 19 19 19 21 21

"Argentina" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 8 12 12 17 19 19 31 34 45 56 68 79 97 128 158 266 301 387 387 502 589 690 745 820 1054 1054 1133 1265 1451 1451 1554 1628 1715 1795 1975 1975 2142

the file is called "data_covid19.txt" and is imported this way

datos <- read.table("data_covid19.txt", header = TRUE, check.names = FALSE)

I want the index so I can select the data that I need like this:

datos[i_arg, first:last]

Where:

i_arg <- which(rownames(datos) == "Argentina")
last <- which(colnames(datos) == "3/29/20") #specific date

And first is the index of the column that fulfils the condition

Thanks!

Santiago
  • 29
  • 4
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because we can't copy/paste the data to test. Better to include just a simple example data.frame in the question itself. – MrFlick Sep 03 '20 at 20:57

1 Answers1

1

Your data all has a common type and two variables. This is a strong indicator that it should be structured as a numeric matrix, NOT as a data.frame or data.table. Matrixes have better out-of-the-box support for rowwise operations than data.frame objects. Indeed, the primary advantage of data.frame objects and their cousins over matrixes is their support for heterogeneous data types, which you do not require.

If you are OK with your data being a matrix, this is trivial:

my_matrix <- matrix(
  data = c(0,0,1,0,5,2,1,2,3),
  nrow = 3,
  ncol = 3,
  dimnames = list(
    c("Albania", "Argentina", "Armenia"),
    c("2020-01-01","2020-02-01", "2020-03-01")
  )
)

> my_matrix

          2020-01-01 2020-02-01 2020-03-01
Albania            0          0          1
Argentina          0          5          2
Armenia            1          2          3

> match(TRUE, my_matrix["Argentina",] > 0)

[1] 2


If you insist on your data being a data.frame object or one of its cousins, more convoluted methods start becoming necessary. Your data has a pretty simple structure so you can just transpose it, but this is not really optimal.

my_data_frame <- as.data.frame(my_matrix)

> match(TRUE, t(my_data_frame)[,"Argentina"] > 0)

[1] 2

bcarlsen
  • 1,381
  • 1
  • 5
  • 11