Add results of box.test within a loop in a new column of dataframe

Question

I have a data set with many time series. I would like to check each series for stationarity using the box test. My loop works fine for the test, but how can I export the results (x² and p-value) to an existing dataframe (each time series as rows) as new column?

Here is my dataframe example:

a <- c(0.2569, 0.0145896, 0.0369, 0.025986, 0.12569, 0.3695)
b <- c(0.125, 0.04582, 0.2569, 0.256369, 0.25698, 0.1456)
c <- c(0.2584, 0.05698, 0.1258, 0.2569, 0.098563, 0.1569)

df <- data.frame(a,b,c)

Here the loop, it works fine and give me for every time series x² und p-value:

for(i in 1:ncol(df)) {       
  box <- Box.test(df[ , i] <- df[ , i], type = "Ljung-Box")
  print(box)
}

Now the results should be transferred to the empty columns in a dataframe like this:

d <- c("series1", "series2", "series3")
e <- c("green", "black", "red")
f <- c(18, 24, 12)
p_value <- NA  #to create an empty column
x <- NA   

df2 <- data.frame(d,e,f,p_value,x)

My first idea was this:

df2$p_value <- box$p.value

But here I get in each row the same p_value.

I think, I have to do it with a new loop, but here I dont now how to implement "df2[i , ] <- df2[i ,]" it:

for(i in 1:nrow(df2)) {       
    df2$p_value <- box$p.value(df2[i , ] <- df2[i ,])
}

This doesn't work. Can somebody help me? Maybe with another function?

jay.sf · Accepted Answer · 2023-03-07T14:16:00.500

1

You can use sapply and subset for the p value.

bres <- sapply(df, \(x) Box.test(x, type="Ljung-Box")[['p.value']])

I strongly recommend to explicitly formulate a dictionary a to avoid mistakes.

a <- setNames(c("series1", "series2", "series3"), c('a', 'b', 'c'))

Then cbind.

cbind(df2, p_value=bres[match(df2$d, a)])
#         d     e  f   p_value
# a series1 green 18 0.8206314
# b series2 black 24 0.6379121
# c series3   red 12 0.1574567

Data:

df <- structure(list(a = c(0.2569, 0.0145896, 0.0369, 0.025986, 0.12569, 
0.3695), b = c(0.125, 0.04582, 0.2569, 0.256369, 0.25698, 0.1456
), c = c(0.2584, 0.05698, 0.1258, 0.2569, 0.098563, 0.1569)), row.names = c(NA, 
-6L), class = "data.frame")

df2 <- data.frame(d=c("series1", "series2", "series3"),
                  e=c("green", "black", "red"),
                  f=c(18, 24, 12))

edited Mar 07 '23 at 14:16

answered Mar 07 '23 at 14:05

jay.sf

60,139
8
53
110

@Bellis Yeah, and you can consider it as the "R way" where we use the [`*apply`-family](https://stackoverflow.com/q/3505701/6574038) instead of most `for` loops. It's easier to write, and the `for` loops are internally executed in much faster C language. – jay.sf Mar 07 '23 at 14:39
Oh I have one more question. When creating the dictionary a, is there another way other than writing down each row name by hand? Maybe with a special syntax? My real data set consists of 90 time series and that would be very long enumeration. – Bellis Mar 07 '23 at 14:53
@Bellis You could do sth like `setNames(paste0('series', seq_along(names(a))), names(a))` – jay.sf Mar 07 '23 at 14:59

Add results of box.test within a loop in a new column of dataframe

1 Answers1