0

I have a data frame with 2 columns being age and sex. I'm doing statistical analysis to determine if there's a difference in the age distribution in the two groups of sex. I know that if I don't call data= it will give an error (I believe it's something w/ the dplyr library). I was wondering what the single . in the data parameter does. Does it direct it to the data frame we used before the %>% ?

age_sex.htest <- d %>%
   t.test(formula=age~sex, data=.)
markus
  • 25,843
  • 5
  • 39
  • 58
  • 3
    The `lhs` is passed to `data` argument of `t.test`. You can read about the pipe operator here: https://magrittr.tidyverse.org/reference/pipe.html#arguments – markus Jan 09 '22 at 22:45
  • 2
    Relevant: [R combinations with dot ("."), "~", and pipe (%>%) operator](https://stackoverflow.com/questions/54815607/r-combinations-with-dot-and-pipe-operator) – markus Jan 09 '22 at 22:48

1 Answers1

2

As @markus has pointed out, d is passed to the data argument in t.test. Here is the output from data(sleep) using the ..

library(dplyr)
data(sleep)

sleep %>% t.test(formula=extra ~ group, data = .)

# Output
    Welch Two Sample t-test

data:  extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean in group 1 mean in group 2 
           0.75            2.33 

If you put sleep directly into data of t.test, then you will get the same result, as t.test is running the exact same data.

t.test(formula=extra ~ group, data = sleep)

# Output

    Welch Two Sample t-test

data:  extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean in group 1 mean in group 2 
           0.75            2.33 

In this case, the . is not that beneficial, though some people prefer this stylistically (I generally do).

However, it is extremely useful when you want to run the analysis on a slight alteration of the dataframe. So, with the sleep dataset, for example, if you wanted to remove ID == 10 from both groups, then you could remove those with filter, and then run the t.test.

sleep %>%
  filter(ID != 10) %>%
  t.test(formula = extra ~ group, data = .)

So, we pass an altered version of the sleep dataset without the rows where ID is 10. So now, we will see a change in the output:

    Welch Two Sample t-test

data:  extra by group
t = -1.7259, df = 15.754, p-value = 0.1039
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -3.5677509  0.3677509
sample estimates:
mean in group 1 mean in group 2 
      0.6111111       2.2111111 
AndrewGB
  • 16,126
  • 5
  • 18
  • 49