0

Very new to R, I know that loops aren't always considered a good option in R but only one part of my code is giving me trouble. Essentially, I'm analysing longitudinal data: each subject has two tables of measures (each table has approx. 200 different study variables) from two different time points. I've read them all in, stored as different variables in R, and am trying to subtract the first table from the second for each participant.

It works fine if I run this as an individual line of code:

data_difference_n <- data_2_n - data_1_n

where n is the participant's ID number, but that would mean running this line for about 1,000 participants whose IDs aren't consecutive numbers. So I've tried to put it inside a loop

participants <- c(100, 105, 106, 119 ...)
for (n in participants) {
  ...
  data_difference_n <- paste("data_difference", subject, sep="_")
  data_1_n <- paste("data_1", subject, sep="_")
  data_2_n <- paste("data_2", subject, sep="_")
  data_difference_n <- data_2_n - data_1_n
  }

which gives me an error of "non-numeric argument to binary operator".

Each data table is a CSV with the same properties, mostly numbers and some cells with N/A. The first bit of code gives me the result I want: a new table where all the numerical values are the values in the first table subtracted from the values in the second, for that participant. I'm confused about why the second bit of code doesn't work, because the result should call the same variables as the first?

I've tried reading a lot of other posts about this error here and on other sites but can't seem to resolve it. This one says that using apply converts the data frame to a character matrix, is it the same principle with looping? Feel like I'm missing something really basic and simple here - apologies if so and would appreciate any help!

  • 1
    Most probably due to binary operations on strings and missing values (NA) in one or both of your data frames. remove or correct those before using operations like `data_2_n - data_1_n`. Dont use binary operators with strings, they need to be numeric – Mankind_008 Jul 02 '18 at 03:48
  • @Mankind_008 Thanks! Why does it work if I run, eg. "data_difference_100 <- data_2_100 - data_1_100" (using the same data frames) as an individual line of code? –  Jul 02 '18 at 03:56
  • 2
    You will need to provide a reproducible example/ data for your problem to help better. [Refer this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Mankind_008 Jul 02 '18 at 04:08
  • Your variables `data_1_n` and `data_2_n` contain strings, and you want to use these strings as variable names, e.g. to look up the `data_1_100` table with the string `"data_1_100"`. I think you can do this with `data_difference_n <- get(data_2_n) - get(data_1_n)` as `get` will find the variable using the string. Don't really recommend this approach though. – Marius Jul 02 '18 at 04:11
  • @Marius that worked! Thanks! If it's not too much trouble, could you please elaborate on why that approach isn't recommended/what a better way to go about it might be? –  Jul 02 '18 at 04:17
  • It's partly related to https://stackoverflow.com/questions/17559390/why-is-using-assign-bad. You could just have two tables with all the participant data in them, `data_1` and `data_2`, and subset them by participant ID when you needed to get at a particular participant. R has much better tools for doing subsetting and grouping than it does for mucking around with variable names. – Marius Jul 02 '18 at 04:23
  • `paste` just assigns a string to your variable. It doesn't give you the value stored in the variable of that name. So, instead of doing `data_1_n - data_2_n`, you're doing ` "data_1_n" - "data_2_n" `. Not the same thing – Rohit Jul 02 '18 at 06:39

0 Answers0