Making a function ignore the first row in R

Question

So I have this table about virus and it's infections/death/etc. Here is my code and original table:

data <- clean_names(read.csv("Book1.csv"))
TableN <- data %>% mutate(PeopleInfected = Suseptible* 1/8) %>% mutate(PeopleDeadFromInfection = PeopleInfected * 1/8) %>%
mutate(ImmuneFromSmallpox = PeopleInfected - PeopleDeadFromInfection) 

        > head(TableN)
      age Susceptible PeopleInfected PeopleDeadFromInfection ImmuneFromSmallpox
    1   0      1300        162.500                20.31250          142.18750
    2   1      1000        125.000                15.62500          109.37500
    3   2       855        106.875                13.35938           93.51562
    4   3       798         99.750                12.46875           87.28125
    5   4       760         95.000                11.87500           83.12500
    6   5       732         91.500                11.43750           80.06250

There is a 1/8 chance of getting the virus, and an additional 1/8 chance of dying from it. Those who survive are forever immune to the virus. As a result, I have to subtract the 'ImmuneFromSmallpox' column from 'suspectible' to account for those who survived and are forever immune from the virus.

Here is what the table should look like:

> head(TableN)
  age Susceptible PeopleInfected PeopleDeadFromInfection ImmuneFromSmallpox
1   0      1300        162.500                20.31250          142.18750
2   1      858.8125    107.226                13.40335          93.82265
3   2      761.17735   95.1471                11.89339          83.25371

To be more clear, I want the first row to stay the same, but I want the other rows to change. What function do you recommend I use?

Why do values in `PeopleInfected`, `PeopleDeadFromInfection` and `ImmuneFromSmallpox` change in your desired output? — Ronak Shah, Jan 07 '21 at 02:39
I am accounting for the people who already had the virus. They are forever immune to it now. Here are the calculations as follows. Keep in mind, the infection rate of it is 1/8, and the death rate of it is 1/8. For people aged 0 (newborn), 1300 * 1/8 = 162.5 get infected and 20.3 die. The 142.2 newborn who survive can no longer get the virus, therefore, we subtract 142.2 people off of the susceptible column. Keep in mind it was originally 1000 for age 1, so we get 858.8125. For children aged 1, 1/8 get infected and 1/8 die. The calculation repeats for each age. Hope that explained it well! — codermcgee, Jan 07 '21 at 02:59
@codermcgee There are quite a few answers now that show you many different ways of achieving the result. Please accept whichever answer you think is most helpful by clicking on the v-sign next to the beginning of each answer. Thanks! — coffeinjunky, Jan 10 '21 at 01:32

score 1 · Accepted Answer · answered Jan 07 '21 at 01:21

You can try replace over the first ImmuneFromSmallpox value, e.g.,

transform(
  df,
  survivors = survivors - replace(ImmuneFromSmallpox, 1, 0)
)

or logical value seq(nrow(df)) > 1

transform(
  df,
  survivors = survivors - ImmuneFromSmallpox * (seq(nrow(df)) > 1)
)

which gives

  age survivors PeopleInfected PeopleDeadFromInfection ImmuneFromSmallpox
1   0 1300.0000        162.500                20.31250          142.18750
2   1  890.6250        125.000                15.62500          109.37500
3   2  761.4844        106.875                13.35938           93.51562
4   3  710.7188         99.750                12.46875           87.28125
5   4  676.8750         95.000                11.87500           83.12500
6   5  651.9375         91.500                11.43750           80.06250

Data

> dput(df)
structure(list(age = 0:5, survivors = c(1300L, 1000L, 855L, 798L, 
760L, 732L), PeopleInfected = c(162.5, 125, 106.875, 99.75, 95,
91.5), PeopleDeadFromInfection = c(20.3125, 15.625, 13.35938,
12.46875, 11.875, 11.4375), ImmuneFromSmallpox = c(142.1875,
109.375, 93.51562, 87.28125, 83.125, 80.0625)), class = "data.frame", row.names = c("1",        
"2", "3", "4", "5", "6"))

score 0 · Answer 2 · answered Jan 07 '21 at 01:12

One approach would be to use case_when, as in this answer. dplyr mutate with conditional values.

But to be honest, I'd just be hacky about it and break off the top row, then mutate the rest, then rbind them back together. Getting all rows except the first can be done with data[2:nrow(data),].

coffeinjunky · Answer 3 · 2021-01-07T01:53:51.950

Try subsetting:

data <- clean_names(read.csv("Book1.csv"))

data[-1,] <- data[-1,] %>% mutate(...)

Here, subsetting the data.frame on both sides with df[-1,] means that you are only operating on all rows but the first one. After you have done this, you can call data to see that all but the first row have changed.

More generally, you can create vectors with row numbers to pick just the rows you want to work on. Say, idx = c(1,3,5). Then, data[idx,] would pick up rows 1, 3 and 5 from data. The minus in data[-1,] is an R-shorthand for excluding the first row. Similarly, if you wanted to pick up specific columns, you can do the same using data[, col_idx]. If your data.frame has explicit rownames, then you can also call those like you would call columns.

score 0 · Answer 4 · answered Jan 07 '21 at 03:13

0

Tryt he following :

library(dplyr)

TableN %>% 
  mutate(Susceptible = Susceptible - lag(ImmuneFromSmallpox, default = 0),
         PeopleInfected = Susceptible* 1/8, 
         PeopleDeadFromInfection = PeopleInfected * 1/8) -> TableN

TableN

answered Jan 07 '21 at 03:13

Ronak Shah

377,200
20
156
213

Making a function ignore the first row in R

4 Answers4