4

I have read how to divide the values in one column by those in another column in R. But I want to know how to divide the values in multiple columns by the values in a single column. Also my first column is composed of non-numerics. How do a write a script in R to do all this and avoid the non-numeric?

I want to divide HDL and HDW by SVL in the .csv doc below

species SVL      HDL    HDW
PM     26.68    9.27    9.83
PM     23.46    8.41    8.59
PM     24.15    8.36    8.1
PM     23.09    8.91    8.79
Tioman 31.8    11.65    11.18
Tioman 29      10.88    10.66
user2957945
  • 2,353
  • 2
  • 21
  • 40
L. Grismer
  • 93
  • 1
  • 1
  • 4
  • In "species SVL HDL HDW PM 26.68 9.27 9.83 PM 23.46 8.41 8.59 PM 24.15 8.36 8.1 PM 23.09 8.91 8.79 Tioman 31.8 11.65 11.18 Tioman 29 10.88 10.66", there are no even line breaks – user31264 Sep 11 '16 at 21:26
  • 2
    `df1[c("HDW", "HDL")] / df1$SVL` – user2957945 Sep 11 '16 at 21:33
  • 1
    @user2957945 though yours is fine, is it more common to reference them by column-index instead of list-index? `dat[,c("HDL","HDW")] / dat[,"SVL"]` – r2evans Sep 11 '16 at 21:36
  • @r2evans; Im not sure it makes a difference - less typing without the comma. – user2957945 Sep 11 '16 at 21:41
  • I think it's mostly stylistic, but the column-based (`df1[,"SVL"]`) is required to get a `numeric` vector on the denom, required to divide all numer columns by a single vector (as you demonstrate in your suggestion ... beating me by seconds :-). – r2evans Sep 11 '16 at 21:46
  • Both of these worked very well. But when I print them to a .csv doc. how do I get column 1 (the non-numeric) to print with the ratios? That way I can keep track of the data. – L. Grismer Sep 13 '16 at 12:37

2 Answers2

1

I like the dplyr package for this stuff. Given you have read in your data as a csv, then it is easy to define new columns as functions of other ones using the mutate command. e.g.

require(dplyr)
mydata<-tbl_df(mydata) #Make it into a tbl class
#Define the new columns
mydata<-mydata%>%
  mutate(HDLSVL=HDL/SVL)%>%
  mutate(HDWSVL=HDW/SVL)
amg
  • 194
  • 2
  • 11
  • `?mutate_all` might make this a bit simpler. – thelatemail Sep 12 '16 at 01:03
  • Thanks - I hadn't come across `mutate_all` yet. Looks like I need to update my version of dplyr :) – amg Sep 12 '16 at 02:15
  • Also, no need for two `mutate()` calls. Can just be `mydata%>%mutate(HDLSVL=HDL/SVL, HDWSVL=HDW/SVL)` – Simon Jackson Sep 12 '16 at 05:30
  • Ah - I didn't realise that. I'm still a little green when it comes to dplyr. – amg Sep 12 '16 at 18:50
  • @pablo_sci. You got it the wrong way: `mutate_each()` is deprecated. Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead. – Gabra Sep 14 '17 at 05:37
  • Yeap Gabra, that answer was not updated. The correct one is at: https://stackoverflow.com/a/44034791/4249750. It explains how to use mutate_at in the different versions. – Pablo Casas Sep 14 '17 at 14:31
1

Here is one option with data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), specify the columns of interest in .SDcols, loop through it (lapply(..) and divide (/) with the 'SVL'.

library(data.table)
setDT(df1)[, lapply(.SD, `/`, df1$SVL), .SDcols = HDL:HDW]

If we need to create new columns based on the divided output, then

setDT(df1)[, paste0(names(df1)[3:4],"_SVL") := lapply(.SD, `/`, df1$SVL), .SDcols = HDL:HDW]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @akrunhow do you recommend passing more than one column to the `[, lapply(.SD, `/`, df1$SVL)` argument?. In my df with 48 columns, I tried passing column names like so: `setDT(df1)[, lapply(.SD, `/`, df1$col25:df1$col48), .SDcols = col1:col24]`. In this instance, `col25` was used as the numerator for each successive operation applied to `col1`:`col24` variables instead of `col1`/`col25`, `col2`/`col26`...`col24`/`col48`. The errors produced were: `numerical expression has 3532 elements: only the first used` & `longer object length is not a multiple of shorter object length`. – On_an_island Dec 07 '18 at 17:44
  • @akrun how do we deal with this assuming the column names are unknown? – NelsonGon Jan 09 '19 at 15:33
  • @akrun how do we deal with this assuming the column names are unknown? – NelsonGon Jan 09 '19 at 15:33
  • @NelsonGon In that case, if you have the column index, use that – akrun Jan 09 '19 at 16:26
  • @akrun I asked my question that someone unfortunately chose to vote for closure. Anyways, I cannot know the column index because it's a function that takes on any data. I'll try to see if I can accomplish it. Thanks! – NelsonGon Jan 09 '19 at 16:28
  • @NelsonGon If there is no column index or column names, it is not clear what you want to accomplish. Is it stored in an object? – akrun Jan 09 '19 at 16:30
  • 1
    I've resolved to do it outside the function. Thanks for your help. – NelsonGon Jan 09 '19 at 16:48