1

5Can anyone give me a hint to speed up the following program? Situation: I have a huge amount of measurement data. I need to extract data for "10 minutes stable operation conditions" of 5 parameters i.e. column values.

Here is my (working, but really slow) solution: - Take the first 10 rows from the dataframe - Compare the min and max of each column to the first value of the column - If at least one column min or max is not within tolerance, delete the first row, repeat - If they are within tolerance, calculate the mean of the results, store them, delete 10 rows, repeat. - break when the dataframe has less than 10 rows

Since I am using a repeat loop, this takes 30min to extract 610 operation points from 86.220 minutes of data.

Any help is appreciated. Thanks!

edit: I created some code to explain. Please note that I deleted the checking routines for na values and standby operation (values around 0):

n_cons<-5 # Number of consistent minutes?

### Function to check wheter a value is within tolerance
f_cons<-function(min,max,value,tol){
    z<-max > (value + tol) | min < (value - tol);    
    return(z)
}

# Define the +/- tolerances
Vu_1_tol<-5 # F_HT
Vu_2_tol<-5 # F_LT

# Create empty result map
map<-c(rep(NA,3))
dim(map)<- c(1,3)
colnames(map)<-list("F_HT","F_LT","Result")


system.time(
    repeat{
        # Criteria to break
        if(nrow(t6)<n_cons){break}

        # Subset of the data to check
        t_check<-NULL
        t_check<-cbind(t6$F_HT[1:n_cons],
                       t6$F_LT[1:n_cons]
        )

        # Check for consistency
        if(f_cons(min(t_check[,1]),max(t_check[,1]),t_check[1,1],Vu_1_tol)){t6<-t6[-1,]
                                                                                     next}
        if(f_cons(min(t_check[,2]),max(t_check[,2]),t_check[1,2],Vu_2_tol)){t6<-t6[-1,]
                                                                                     next}

        # If the repeat loop passes the consistency check, store the means
        attach(t6[1:n_cons,])
        # create a new row wih means of steady block
        new_row<-c(mean(F_HT),mean(F_LT),mean(Result))
        new_row[-1]<-round(as.numeric(new_row[-1]),2)
        map<-rbind(map,new_row) # attach new steady point to the map
        detach(t6[1:n_cons,])
        t6<-t6[-(1:n_cons),] # delete the evaluated lines from the data
    }
)

The data I am using looks like this

t6<-structure(list(F_HT = c(1499.71, 1500.68, 1500.44, 1500.19, 1500.31, 
1501.76, 1501, 1551.22, 1500.01, 1500.52, 1499.53, 1500.78, 1500.65, 
1500.96, 1500.25, 1500.76, 1499.49, 1500.24, 1500.47, 1500.25, 
1735.32, 2170.53, 2236.08, 2247.48, 2250.71, 2249.59, 2246.68, 
2246.69, 2248.27, 2247.79), F_LT = c(2498.96, 2499.93, 2499.73, 
2494.57, 2496.94, 2507.71, 2495.67, 2497.88, 2499.63, 2506.18, 
2495.57, 2504.28, 2497.38, 2498.66, 2502.17, 2497.78, 2498.38, 
2501.06, 2497.75, 2501.32, 2500.79, 2498.17, 2494.82, 2499.96, 
2498.5, 2503.47, 2500.57, 2501.27, 2501.17, 2502.33), Result = c(9125.5, 
8891.5, 8624, 8987, 9057.5, 8840.5, 9182, 8755.5, 9222.5, 9079, 
9175.5, 9458.5, 9058, 9043, 9045, 9309, 9085.5, 9230, 9346, 9234, 
9636.5, 9217.5, 9732.5, 9452, 9358, 9071.5, 9063.5, 9016.5, 8591, 
8447.5)), .Names = c("F_HT", "F_LT", "Result"), row.names = 85777:85806, class = "data.frame")

With this code and data, I get 3 steady operation points, which is what I want, but which is very slow.

Hopefully, this helps to better explain my problem.

Jochen Döll
  • 383
  • 3
  • 11
  • 1
    Not without some actual data or code... – Thomas Aug 07 '13 at 09:56
  • 1
    It depends on how you're deleting rows. A reproducible example would go a long way of helping you with specific code. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Roman Luštrik Aug 07 '13 at 09:57
  • Note that you don't need to recalculate the means from scratch just calculate the change due to replacing the oldest value with the newest one, basically `mean->mean+(v_{n+1}-v_1)/n)` – Itamar Aug 07 '13 at 11:02
  • 1
    How about `package(zoo); rollmean; rollmax` and that family of functions? Another possibility (again, posting your code would help) is to convert from a `dataframe` to a `matrix` . BTW, it's not how many "minutes" your data cover, it's how many rows are in the array. – Carl Witthoft Aug 07 '13 at 11:30
  • Yes, Carl, you are right. Minutes and rows are the same in my example... – Jochen Döll Aug 07 '13 at 13:14

1 Answers1

1

Heureka! Thanks to the comment of Carl Witthoft, I was able to speed up the proces by factor 15! I used rollapply a lot, because rollmean and rollmax had some problems with NA which did not occur when using rollaply. Thanks for your help!

Here is what I did I used the same data like before:

# Use only the values needed to check for stability
t7<-as.data.frame(cbind(t6$F_HT,t6$F_LT))

n_cons<-5 # Number of consistent minutes?

# Calculate the mean values for each column over 5 rows
t7_rm<-rollapply(t7,n_cons,mean,align = "left")
colnames(t7_rm)<-c("mean_F_HT","mean_F_LT")

# idem with maximum
t7_max<-rollapply(t7,width=n_cons,FUN=max, na.rm = F,align = "left")
colnames(t7_max)<-c("max_F_HT","max_F_LT")

# idem with minimum
t7_min<-rollapply(t7,width=n_cons,FUN=min, na.rm = F,align = "left")
colnames(t7_min)<-c("min_F_HT","min_F_LT")

# create table with maximum absolute daviation from the mean values
t7_dif<-pmax((t7_max-t7_rm[1:nrow(t7_max),]),(t7_rm[1:nrow(t7_min),]-t7_min))
colnames(t7_dif)<-c("diff_F_HT","diff_F_LT")



# Enter tolerance limits
V1_tol<-50 # F_HT
V2_tol<-50 # F_LT

# Create a tolerance table
t7_tol<-cbind(rep(V1_tol,nrow(t7_dif)),rep(V2_tol,nrow(t7_dif)))

# Create a logical table with TRUE or FALSE depending on if the max deviation is within tolerance
t7_check<-(t7_dif<t7_tol)

# Replace all "FALSE" with "NA" (in order to use is.na)
t7_check_NA<-apply(t7_check,c(1,2),function(x) {ifelse(x==FALSE,NA,x)})

# Create rolling mean over complete data
t6_rm<-rollapply(t6,n_cons,mean,na.rm=TRUE,align = "left")

# Create a map of stable operation points with means of parameters and result
t6_map<-t6_rm[complete.cases(t7_check_NA),]

The result differs from my original one, because no lines are omitted. But this works for me.

Jochen Döll
  • 383
  • 3
  • 11