0

I have a tibble called 'Volume' in which I store some data (10 columns - the first 2 columns are characters, 30 rows). Now I want to calculate the relative Volume of every column that corresponds to Column 3 of my tibble. My current solution looks like this:

rel.Volume_unmod = tibble(
            "Volume_OD" = Volume[[3]] / Volume[[3]],
            "Volume_Imp" = Volume[[4]] / Volume[[3]],
            "Volume_OD_1" = Volume[[5]] / Volume[[3]],
            "Volume_WS_1" = Volume[[6]] / Volume[[3]],
            "Volume_OD_2"  = Volume[[7]] / Volume[[3]],
            "Volume_WS_2" = Volume[[8]] / Volume[[3]], 
            "Volume_OD_3" = Volume[[9]] / Volume[[3]],
            "Volume_WS_3" = Volume[[10]] / Volume[[3]])
rel.Volume_unmod 

I would like to keep the tibble structure and the labels. I am sure there is a better solution for this, but I am relative new to R so I it's not obvious to me. What I tried is something like this, but I can't actually run this:

rel.Volume = NULL
for(i in Volume[,3:10]){

rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
}
Carlo
  • 135
  • 7
  • Welcome to stack overflow. Please make your question reproducible: paste a copy of your data into the question using `dput(Volume)` see [MRE] – Peter May 03 '21 at 12:37

2 Answers2

2

Mockup Data

Since you did not provide some data, I've followed the description you provided to create some mockup data. Here:

set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
                     GR = sample(LETTERS, 30, TRUE))
Volume[3:10] <- rnorm(30*8)

Solution with Dplyr

library(dplyr)

# rename columns [brute force]
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(Volume)[3:10] <- cols

# divide by Volumn_OD
rel.Volume_unmod <- Volume %>% 
  mutate(across(all_of(cols), ~ . / Volume_OD))

# result
rel.Volume_unmod

Explanation

  • I don't know the names of your columns. Probably, the names correspond to the names of the columns you intended to create in rel.Volume_unmod. Anyhow, to avoid any problem I renamed the columns (kinda brutally). You can do it with dplyr::rename if you wan to.
  • There are many ways to select the columns you want to mutate. mutate is a verb from dplyr that allows you to create new columns or perform operations or functions on columns.
  • across is an adverb from dplyr. Let's simplify by saying that it's a function that allows you to perform a function over multiple columns. In this case I want to perform a division by Volum_OD.
  • ~ is a tidyverse way to create anonymous functions. ~ . / Volum_OD is equivalent to function(x) x / Volumn_OD
  • all_of is necessary because in this specific case I'm providing across with a vector of characters. Without it, it will work anyway, but you will receive a warning because it's ambiguous and it may work incorrectly in same cases.

More info

Check out this book to learn more about data manipulation with tidyverse (which dplyr is part of).


Solution with Base-R

rel.Volume_unmod <- Volume

# rename columns
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(rel.Volume_unmod)[3:10] <- cols

# divide by columns 3
rel.Volume_unmod[3:10] <- lapply(rel.Volume_unmod[3:10], `/`, rel.Volume_unmod[3])
rel.Volume_unmod

Explanation

  • lapply is a base R function that allows you to apply a function to every item of a list or a "listable" object.
  • in this case rel.Volume_unmod is a listable object: a dataframe is just a list of vectors with the same length. Therefore, lapply takes one column [= one item] a time and applies a function.
  • the function is /. You usually see / used like this: A / B, but actually / is a Primitive function. You could write the same thing in this way:
 `/`(A, B) # same as A / B
  • lapply can be provided with additional parameters that are passed directly to the function that is being applied over the list (in this case /). Therefore, we are writing rel.Volume_unmod[3] as additional parameter.
  • lapply always returns a list. But, since we are assigning the result of lapply to a "fraction of a dataframe", we will just edit the columns of the dataframe and, as a result, we will have a dataframe instead of a list. Let me rephrase in a more technical way. When you are assigning rel.Volume_unmod[3:10] <- lapply(...), you are not simply assigning a list to rel.Volume_unmod[3:10]. You are technically using this assigning function: [<-. This is a function that allows to edit the items in a list/vector/dataframe. Specifically, [<- allows you to assign new items without modifying the attributes of the list/vector/dataframe. As I said before, a dataframe is just a list with specific attributes. Then when you use [<- you modify the columns, but you leave the attributes (the class data.frame in this case) untouched. That's why the magic works.
Edo
  • 7,567
  • 2
  • 9
  • 19
0

Whithout a minimal working example it's hard to guess what the Variable Volume actually refers to. Apart from that there seems to be a problem with your for-loop:

for(i in Volume[,3:10]){

Assuming Volume refers to a data.frame or tibble, this causes the actual column-vectors with indices between 3 and 10 to be assigned to i successively. You can verify this by putting print(i) inside the loop. But inside the loop it seems like you actually want to use i as a variable containing just the index of the current column as a number (not the column itself):

rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])

Also, two brackets are usually used with lists, not data.frames or tibbles. (You can, however, do so, because data.frames are special cases of lists.)

Last but not least, initialising the variable rel.Volume with NULL will result in an error, when trying to reassign to that variable, since you haven't told R, what rel.Volume should be.

Try this, if you like (thanks @Edo for example data):

set.seed(1)

Volume <- data.frame(ID = sample(letters, 30, TRUE),
                     GR = sample(LETTERS, 30, TRUE),
                     Vol1 = rnorm(30),
                     Vol2 = rnorm(30),
                     Vol3 = rnorm(30))

rel.Volume <- Volume[1:2] # Assuming you want to keep the IDs.
# Your data.frame will need to have the correct number of rows here already.

for (i in 3:ncol(Volume)){ # ncol gives the total number of columns in data.frame
  rel.Volume[i]  = Volume[i]/Volume[3]
}

A more R-like approach would be to avoid using a for-loop altogether, since R's strength is implicit vectorization. These expressions will produce the same result without a loop:

# OK, this one messes up variable names...
rel.V.2 <- data.frame(sapply(X = Volume[3:5], FUN = function(x) x/Volume[3]))

rel.V.3 <- data.frame(Map(`/`, Volume[3:5], Volume[3]))

Since you said you were new to R, frankly I would recommend avoiding the Tidyverse-packages while you are still learing the basics. From my experience, in the long run you're better off learning base-R first and adding the "sugar" when you're more familiar with the core language. You can still learn to use Tidyverse-functions later (but then, why would anybody? ;-) ).

Codebird
  • 91
  • 8
  • Asking "why would anybody learn Tidyverse functions" is quite narrow-sighted. The `tidyverse` packages are some of the most downloaded packages and that's because people see value in them. Then, if you don't like them, fair enough: you don't have to use them. You can code everything by yourself with base-R. However, the `tidyverse` functions are tested, they have a great documentation, they provide understandable errors and warnings. It depends on your scope. In most of R courses, `tidyverse` is a must. [continue] – Edo May 03 '21 at 14:35
  • 1
    If you don't like `tidyverse`, I would suggest you to look into `data.table`. It's based on a different concept, but it's (for many) what nowadays makes R extremely competitive. `data.table` is built around the concepts of consistency, speed, concision. And it's more friendly to other packages than `tidyverse` packages (since some tidyverse functions modify some base-R functions). – Edo May 03 '21 at 14:39
  • check [this great question](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) out for more info on the topic – Edo May 03 '21 at 14:43
  • Sorry, last phrase was meant rather jokingly as a reference to the "flame war" that's going on regarding tidyverse. I thought it was obvious. I edited my post to include a blinking smiley to make it more clear. In any case, I think you didn't really understand what I tried to convey: I only recommended to avoid tidyverse while you're still learning the basics. From my experience in teaching R, getting to understand the core language is hard enough to do for most beginners. In the long run you're usually better off learning the actual basics first and adding syntactic sugar as you progress. – Codebird May 04 '21 at 16:57
  • oops lol, sorry about that! :D Didn't mean to sound bad. I just wanted to help @Carlo actually. I think that usually beginners struggle with keeping their code clean. Tidyverse can help you with that. On the other hand, with baseR and data.table things can get messy pretty soon. – Edo May 04 '21 at 17:18
  • Yes, I agree with you. I would just add, that a big portion of what clean code looks like to you actually depends on what you're used to read. Still, clean code might be easier to read, but harder to understand for novice user. Of course there's no point in trying to do everything in baseR without using any package. :) – Codebird May 04 '21 at 19:56