According to this topic I'll try to describe my proposition:
What I understand is that we have got a dataframe of dates and thousands of companies. Here's our example dataframe called prices:
> prices
newdates nsp1 nsp2 nsp3 nsp4
1 2000-01-03 NA NA NA NA
2 2000-01-04 79.5 325.0 NA 961
3 2000-01-05 79.5 322.5 NA 945
4 2000-01-06 79.5 327.5 NA 952
5 2000-01-07 NA 327.5 NA 941
6 2000-01-10 79.5 327.5 NA 946
7 2000-01-11 79.5 327.5 NA 888
To create a new dataframe of log-returns I used below code:
logs=data.frame(
+ cbind.data.frame(
+ newdates[-1],
+ diff(as.matrix(log(prices[,-1])))
+ )
+ )
> logs
newdates..1. nsp1 nsp2 nsp3 nsp4
1 2000-01-04 NA NA NA NA
2 2000-01-05 0 -0.007722046 NA -0.016789481
3 2000-01-06 0 0.015384919 NA 0.007380107
4 2000-01-07 NA 0.000000000 NA -0.011621895
5 2000-01-10 NA 0.000000000 NA 0.005299429
6 2000-01-11 0 0.000000000 NA -0.063270826
To clarify what is going on in this code lets analyze it from the inside out:
Step 1: Calculating log-returns
- You know that
log(a/b) = log(a)-log(b)
, so we can calculate
differences of logarithms. Funcition diff(x,lag=1)
calculates
differences with given lag. Here it is lag=1
so it gives first
differences.
- Our
x
are prices in dataframe. Do pick from a
data.frame
every columns without the first (there are dates) we use
prices[,-1]
.
- We need logarithms, so
log(prices[,-1])
- Function
diff()
works with vector or matrix, so we need to treat
calculated logarithms as matrix, thus
`as.matrix(log(prices[,-1]))
- Now we can use
diff()
with lag=1
, so diff(as.matrix(log(prices[,-1])))
Step 2: Creating dataframe of log-returns and dates
We can't use just cbind()
. Firstly, because lengths are different (returns are shorter by 1 record). We need to remove first date, so newdates[-1]
Secondly, using cbind() dates will be transformed into numeric values such 160027 or other.
Here we have to use cbind.data.frame(x,y)
, as seen above.
Now data is ready and we can create use a data.frame()
and name it as logs so logs=data.frame(...)
as above.
If your dataset look like dataframe prices it should run. Most important thing is to use diff(log(x))
to easily calculate log-returns.
If you have any questions or problem, then just ask.