0

I have a data frame that is 68 x 252. I am trying to find the log change between every vector relative to the first vector. ie if v1 = 68 and v2,v3,v4 = 70, 71, 72... I want the new data frame output to be 0, 0.013,0.019,0.025.

data<-data[3:nrow(data)]/data[2:2]

returns

"Error in Ops.data.frame(data[3:nrow(data)], data[2:2]) : ‘/’ only defined for equally-sized data frames"

I have also tried making each their own matrix

div<-as.vector(data[2:2])
num<-as.matrix(data[3:nrow(data)])
test<-num/div

returns:

Error in num/div : non-conformable arrays

or

log(num/div)
Mikev
  • 2,012
  • 1
  • 15
  • 27
RWNY25
  • 3
  • 1
  • try `data[3:nrow(data),] <- data[3:nrow(data),]/data[2,2]` – akrun Feb 25 '19 at 16:57
  • Thank you! These appear to be simple returns, is there a way to do log instead? – RWNY25 Feb 25 '19 at 17:02
  • Also I received this "Warning message: In Ops.factor(left, right) : ‘/’ not meaningful for factors" – RWNY25 Feb 25 '19 at 17:03
  • That is bcz you have `factor` class instead of `numeric` – akrun Feb 25 '19 at 17:04
  • Would I have to convert the entire data.frame into a matrix to have all of the vectors be numeric? – RWNY25 Feb 25 '19 at 17:20
  • `as.numeric(as.character(` `data[] <- lapply(data, function(x) as.numeric(as.character(x)))` – akrun Feb 25 '19 at 17:21
  • I still think I am missing the mark on something here and I apologize because I am new... If I use the following code `data<-data[3:nrow(data),]/data[2:2]` I recieve "Error in Ops.data.frame(data[3:nrow(data), ], data[2:2]) : ‘/’ only defined for equally-sized data frames" The goal is to receive continuous returns struck to the first vector When I do str(data) it shows each vector is a numeric.. – RWNY25 Feb 25 '19 at 18:33

1 Answers1

0

I think data.frames are easier to work with than matrixes. I like to keep each step of the problem as a different data.frame so it is easy to retrace a step if I need to and I can see clearly what each transformation has done.

If we start with some sample data (although typically you would be reading this in from a file using read.csv or something similar):

#get data as a data frame
df <- structure(list(v1 = c(68, 120), v2 = c(70, 121), v3 = c(71, 122), v4 = c(72, 123)), class = "data.frame", row.names = c("row1", "row2"))

df
      v1  v2  v3  v4
row1  68  70  71  72
row2 120 121 122 123

Convert to log(base10):

#take log (base 10) of data frame
df_log <- log10(df)

df_log
           v1       v2       v3       v4
row1 1.832509 1.845098 1.851258 1.857332
row2 2.079181 2.082785 2.086360 2.089905

Last step...

df_ans <- df_log[,] - df_log[,1] 

which means, from every row and column of our log data frame df_log[,] subtract the value from the first column df_log[,1] and store the results in df_ans.

Result:

df_ans
     v1          v2          v3         v4
row1  0 0.012589127 0.018749436 0.02482358
row2  0 0.003604124 0.007178585 0.01072387

Note: this last operation must be done with two different data frames df_ans and df_log. If you tried to do it all on the same data.frame like df_log <- df_log[,] - df_log[,1], it won't work as df_log[,1] will be modified to 0 mid-operation and then you would be subtracting 0 from all cells which will not work very well!

indubitably
  • 297
  • 2
  • 7
  • No problem. :). Happy to help. Try clicking the tick mark to mark the question as answered. – indubitably Feb 25 '19 at 21:36
  • What if we changed up the logic a bit? If we decided we wanted to subtract each row by row 1. I adjusted the formatting of the data frame, where what was previously in V1 is now in Row 1. `log_data<-log10(data)` `log_ret<-log_data-log_data[1,]` "Error in Ops.data.frame(log_data, log_data[1, ]) : ‘-’ only defined for equally-sized data frames" Thank you – RWNY25 Feb 26 '19 at 15:06
  • It would be nice if the syntax was symmetric for rows and columns, but alas. [This question](https://stackoverflow.com/questions/35892111/subtract-one-row-from-another-row-in-df) addresses that but your options are: 1) to to create a data frame with repetitions of row 1 that is the same size as your original data frame and subtract one data frame from the other, or 2) to use a for-loop to loop through the rows. For row operations I usually use option 2. – indubitably Feb 26 '19 at 22:01