0

i am learning R alone. Here is a piece of code I found online, to name a variable (column):

df <- df %>% rename(employees = How.many.employees.does.your.company.or.organization.have.)
colnames(df)[2]

I don't understand this "%>%" thing. Can I avoid it ? What exactly does it do? The "df <- df" means that I am going to do an operation in df (indeed, rename a variable)

thanks!

Awake Joe
  • 29
  • 3
  • You can also assign column names from a vector, or just a single element, i.e. here `colnames(df)[2] <- "employees"` should do (if it is the second element, you can `match()` otherwise). – Dirk Eddelbuettel Mar 16 '22 at 18:53

3 Answers3

2

%>% is the pipe operator. It comes from the magrittr package. The latest version of R supports its functionality natively with |>. As the link I've provided says, "pipes are a powerful tool for clearly expressing a sequence of multiple operations," but you can certainly work without them. The link provides a lot more detail about using pipes and the alternatives.

rdelrossi
  • 1,114
  • 1
  • 7
  • 17
  • Can I ask if use of pipe operators slow down codes? – Sweepy Dodo Mar 16 '22 at 18:39
  • 3
    Pipes were once considered slower many years ago but I don't know if that's still a consideration with modern implementations. They're widely used, however, because of the advantages they bring in writing understandable and more easily maintainable code. – rdelrossi Mar 16 '22 at 18:45
2

df <- df %>% rename(employees = How.many.employees.does.your.company.or.organization.have.) colnames(df)[2]

is equivalent to

df <- rename(df, employees = How.many.employees.does.your.company.or.organization.have.) colnames(df)[2]

The pipe takes the object on the left of the pipe and inserts it into the expression on the right, by default as the first argument. This helps avoid nested expressions which can be harder to read. So you could write df %>% do_this %>% then_that %>% and_finally_this instead of and_finally_this(then_that(do_this(df))).

The performance cost of the %>% step will be negligible in any scenario I can imagine.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
1

is equivalent to

df <- rename(df, employees = How.many.employees.does.your.company.or.organization.have.) colnames(df)[2]

Thank you for your answer. <- feels more "intuitive" to me at this point, but I will try to learn both. I'm at the very beginning... I understand the usefulness of %>% if I have a long list of "things to do". Better than writing different lines, or using a lot of (((())))

patL
  • 2,259
  • 1
  • 17
  • 38
Awake Joe
  • 29
  • 3
  • 1
    You may also write `df %<>% rename(...)`, which is equivalent to `df <- df %>% rename(...)`. Note that here `%>%` does not replace the assignment `<-`, but a function call. That is, `x %>% f()` is equivalent to `f(x)`, and `x %>% f(y)` is equivalent to `f(x, y)`. Note that `x` is used as the first argument of `f`, with whatever additional arguments are specified after `%>%`. To pass `x` as the second argument, you would write `x %>% f(y, .)`, equivalent to `f(y, x)`. –  Mar 16 '22 at 19:04
  • 2
    When you will understand this notation better, you will find that it's **much** more pleasant that writing function calls, with loads of nested parentheses, and operations specified in the wrong order, i.e. when you want to apply operation `f` to a dataframe, `df` then operation `g`, then `h`, it's far more readable to write `df %>% f() %>% g() %>% h()` than `h(g(f(df)))`. The call syntax gets even worse when there are (possibly many) arguments to the operations. –  Mar 16 '22 at 19:11