Beginner using pipes

Question

I am a beginner and I'm trying to find the most efficient way to change the name of the first column for many CSV files that I will be creating. Once I have created the CSV files, I am loading them into R as follows:

data <- read.csv('filename.csv')

I have used the names() function to do the name change of a single file:

names(data)[1] <- 'Y'

However, I would like to find the most efficient way of combining/piping this name change to read.csv so the same name change is applied to every file when they are opened. I tried to write a 'simple' function to do this:

addName <- function(data) {
  names(data)[1] <- 'Y'
  data
}

However, I do not yet fully understand the syntax for writing a function and I can't get this to work.

Try to use `colnames` function, see https://stackoverflow.com/questions/7531868/how-to-rename-a-single-column-in-a-data-frame — Peace Wang, Jun 09 '21 at 14:30
Or `rename` function, see https://stackoverflow.com/questions/35023375/r-renaming-passed-columns-in-functions — Peace Wang, Jun 09 '21 at 14:39
Did you design your `addName` function expecting R to **pass by reference**? That is, are you expecting your function to _mutate_ an existing object `x`, supplied in `addName(x)` as the argument for the `data` parameter. If so, this will not work: **R passes by value** rather than by reference. _However_, the line `data <- addName(data)` should work, as should `data <- data %>% addName()` with the `magrittr` package. You can `sapply` this `addName` function to a list of `data.frame`s like your `data` object, and then store the list that `sapply` will return. — Greg, Jun 09 '21 at 15:33

Greg · Accepted Answer · 2021-06-11T14:22:24.517

Note

If you were expecting your original addName function to "mutate" an existing object like so

x <- data.frame(Column_1 = c(1, 2, 3), Column_2 = c("a", "b", "c"))

# Try (unsuccessfully) to change title of "Column_1" to "Y" in x.
addName(x)

# Print x.
x

please be aware that R passes by value rather than by reference, so x itself would remain unchanged:

  Column_1 Column_2
1        1        a
2        2        b
3        3        c

Any "mutation" would be achieved by overwriting x with the return value of the function

x <- addName(x)

# Print x.
x

in which case x itself would obviously be changed:

  Y Column_2
1 1        a
2 2        b
3 3        c

Answer

Anyway, here's a solution that compactly incorporates pipes (%>% from the magrittr package) and a custom function. Please note that without the linebreaks and comments, which I have added for clarity, this could be condensed to only a few lines of code.

# The dplyr package helps with easy renaming, and it includes the magrittr pipe.
library(dplyr)

# ...

filenames <- c("filename1.csv", "filename2.csv", "filename3.csv")

# A function to take a CSV filename and give back a renamed dataset taken from that file.
addName <- function(filename) {
  return(# Read in the named file as a data.frame.
         read.csv(file = filename) %>%
           # Take the resulting data.frame, and rename its first column as "Y";
           # quotes are optional, unless the name contains spaces: "My Column"
           # or `My Column` are needed then.
           dplyr::rename(Y = 1))
}

# Get a list of all the renamed datasets, as taken by addName() from each of the filenames.
all_files <- sapply(filenames, FUN = addName,
                    # Keep the list structure, in which each element is a
                    # data.frame.
                    simplify = FALSE,
                    # Name each list element by its filename, to help keep track.
                    USE.NAMES = TRUE)

In fact, you could easily rename any columns you desire, all in one fell swoop:

dplyr::rename(Y = 1, 'X' = 2, "Z" = 3, "Column 4" = 4, `Column 5` = 5)

Greg, thanks so much for your help! What you first describe about the function not changing X is exactly what happened. Thank you also for your detailed & clear commenting, for a beginner like me, it's sooooo very helpful. Would you mind clarifying if I would have to run the addName function & "sapply()" every time I open one of the 17 files I need to do the exact thing to, & what and how exactly "sapply()" is doing? In, general, I am having problems understanding how to link & organise different steps needed to produce a desired result. — Dodo, Jun 10 '21 at 16:11
Hi @Dodo! What `sapply` does is this. Suppose you have a function that accepts a *single* value (`my_fun <- function(x) {return(2*x)}`) and returns (say) the double of that value. Suppose you also have a vector (`my_vals <- c(1, 2, 3)`) or list (`my_vals <- list(1, 2, 3)`) of *multiple* values. You can easily do `my_fun(1)` to get `2`, `my_fun(2)` to get `4`, and `my_fun(3)` to get `6`. But say you want *all* those results, in one fell swoop! Then `sapply(my_vals, FUN = my_fun)` applies `my_fun` to each of those values in `my_values`, to get you a vector (or list) of the results: `2 4 6`. — Greg, Jun 10 '21 at 16:24
So in the context of my answer, all you (@Dodo) need is a vector (or list) of all the `filenames` you're dealing with. You have an `addName` function, designed to accept any *one* filename and to give back the renamed dataset (a `data.frame`) taken from that file. Now all you need to do is supply `filenames` and `addName` to a *single* `sapply` statement, which cycles through *all* the `filenames`, calls `addName` on each, and puts the results (the renamed datasets) into a common list (in the same order as the filenames). We save this list (a list of `data.frame`s) in the variable `all_files`. — Greg, Jun 10 '21 at 16:42

score 0 · Answer 2 · answered Jun 09 '21 at 15:18

0

This will read a vector of filenames, change the name of the first column of each one to "Y" and store all of the files in a list.

filenames <- c("filename1.csv","filename2.csv")
addName <- function(filename) {
  data <- read.csv(filename)
  names(data)[1] <- 'Y'
  data
}
files <- list()
for (i in 1:length(filenames)) {
   files[[i]] <- addName(filenames[i])
}

answered Jun 09 '21 at 15:18

Couldn't you replace everything after the `addName` definition, with simply `files <- sapply(X = filenames, FUN = addName)`? – Greg Jun 09 '21 at 15:39
1

Sure you could, it's probably faster too. But I wrote it this way since OP mentioned they were a beginner, and as a beginner I found code like this easier to understand. – Jun 09 '21 at 15:41
Thank you for your answer Baroque but I'm finding it quite difficult to follow and understand the code with my limited beginner skills. I understand the function part but the for loop, which I have no previous experience with, goes way over my head. I know the loop is written to cycle through the files but I do not understand the syntax at all. Is there a good beginner learning source that you could recommend for loops? Also, is the loop part of the function or separate? – Dodo Jun 10 '21 at 15:50
1

Hi @Dodo! The function definition ends with the closing brace `addName <- function(filename) { ... }`; so `addName()` will accept a `filename` argument, and return the renamed dataset (a `data.frame`) loaded from that file. Everything afterward is *outside* the function. Next `files <- list()` creates a list, which is initially empty but will be filled by the loop. The `for` loop cycles through each filename, from the `1`st through the `2`nd: at every step, it adds the current (`i`th) dataset (obtained by `addName()` from the current `i`th filename) as a new (`i`th) element to the list. – Greg Jun 10 '21 at 16:08

Beginner using pipes

2 Answers2

Note

Answer

Linked