Hi have a set of data frames that come from exported Excel files from some proprietary software. The data looks like so:
> head(Ball)
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12
1 2014-07-25 00:00:00 <NA> 2014-07-25 00:00:00 <NA> 2014-07-25 00:00:00 <NA> 2014-07-23 00:00:00 <NA> 2014-07-23 00:00:00 <NA> 2014-07-23 00:00:00 <NA>
2 1899-12-31 07:49:00 <NA> 1899-12-31 06:49:00 <NA> 1899-12-31 06:48:00 <NA> 1899-12-31 08:27:00 <NA> 1899-12-31 08:26:00 <NA> 1899-12-31 07:20:00 <NA>
3 X Y X Y X Y X Y X Y X Y
4 0 0 0 0 0 0 0 0 0 0 0 0
5 0.0502222222 2.1945 0.0502222222 1.9437 0.05 1.254 0.0501123596 1.6302 0.0501086957 0 0.05 0
6 0.1004444444 5.7684 0.1004444444 4.7652 0.1 4.2636 0.1002247191 4.2636 0.1002173913 0.3135 0.1 2.1318
Col13 Col14
1 2014-07-23 00:00:00 <NA>
2 1899-12-31 07:19:00 <NA>
3 X Y
4 0 0
5 0.0501123596 1.7556
6 0.1002247191 4.389
This data contains a variable number of usable rows thus many columns have NAs for their last rows in the data frame. I'm trying to get all these data frames (Ball and about 10 others) into a single tidier format with the data from row 4 to the last non-NA row, for each pair of columns. The end result looks like so:
> head(df)
id name routine trial volume flow
1 1 Ball tech post1 0.00000000 0.0000
2 1 Ball tech post1 0.05022222 2.1945
3 1 Ball tech post1 0.10044444 5.7684
4 1 Ball tech post1 0.15066667 6.8343
5 1 Ball tech post1 0.20088889 7.2732
6 1 Ball tech post1 0.25111111 7.5867
Where id is a random identifier linked to the name, name is the name of the data frame imported, routine is assigned a value based on the date on the first line, trial is also assigned a value based on the hour on the second line, volume are the values starting on row 4 under each X and flow the values starting on row 4 under each Y.
Here is the function I came up with, x is the original data frame (in this case "Ball") and y the new data frame to which rows should be added.
tidier <- function(x, y) {
for(col in ncol(x) / 2) {
end.current <- length(x[,col][!is.na(x[,col])])
length.current <- end.current - 3
id = rep(1, length.current)
name = rep("Ball", length.current)
routine <- rep("tech", length.current)
trial <- rep("pre2", length.current)
volume <- as.numeric(Ball[4:end.current, col])
flow <- as.numeric(Ball[4:end.current, col + 1])
temp.df <- data.frame(id, name, routine, trial, volume, flow)
df <- rbind(y, temp.df)
col <- col + 2
return(df)
}
}
I haven't got to the point of setting conditional values for id, name, routine and trial based on the values found in the original data frame. Running the function returns only the original df data frame, without any rows added. I don't get any errors and cannot figure out how to make this work. Hopefully this is clear enough, I'm new to building functions and any help in making this work would be greatly appreciated.