0

I have a data frame dfSub with a number of parameters inside. This is hourly based data for energy use. I need to sort data by each hour, e.g. for each hour get all values of energy from data frame. As a result I expect to have data frame with 24 columns for each hour, rows are filled with energy values.

The hour is specified as 1:24 and in data frame is linked as dfSub$hr. The heat is dfSub$heat

I constructed a for-loop and tried to save with cbind, but it does not work, error message is about different size of rows and columns.

I print results and see them on screen, but cant save as d(dataframe)

here is the code:

  d = NULL
    for (i in 1:24) {
      subh= subset(dfSub$heat, dfSub$hr == i)

      print(subh)

    d = cbind(d, as.data.frame(subh))
    }

append function is not applicable, since I dont know the expected length of heat value for each hour.

Any help is appreciated.

Part of dfSub

    hr wk month dyid wend     t heat
    1  2     1    1    0  -9.00   81
    2  2     1    1    0  -8.30   61
    3  2     1    1    0  -7.80   53
    4  2     1    1    0  -7.00   51
    5  2     1    1    0  -7.00   30
    6  2     1    1    0  -6.90   31
    7  2     1    1    0  -7.10   51
    8  2     1    1    0  -6.50   90
    9  2     1    1    0  -8.90  114
   10  2     1    1    0  -9.90  110
   11  2     1    1    0 -11.70  126
   12  2     1    1    0  -9.70  113
   13  2     1    1    0 -11.60  104
   14  2     1    1    0 -10.00  107
   15  2     1    1    0 -10.20  117
   16  2     1    1    0  -9.00   90
   17  2     1    1    0  -8.00  114
   18  2     1    1    0  -7.80   83
   19  2     1    1    0  -8.10   82
   20  2     1    1    0  -8.20   61
   21  2     1    1    0  -8.80   34
   22  2     1    1    0  -9.10   52
   23  2     1    1    0 -10.10   41
   24  2     1    1    0  -8.80   52
    1  2     1    2    0  -8.70   44
    2  2     1    2    0  -8.40   50
    3  2     1    2    0  -8.10   33
    4  2     1    2    0  -7.70   41
    5  2     1    2    0  -7.80   33
    6  2     1    2    0  -7.50   43
    7  2     1    2    0  -7.30   40
    8  2     1    2    0  -7.10   8

The output expected as:

hr1 hr2 hr3 hr4..... hr24
81  61  53  51 ..... 52
44  50  33  41
Tymo
  • 259
  • 6
  • 14
  • 1
    `cbind` is for columns, and `rbind` is for rows. have you tried `rbind(d, as.data.frame(subh))`? Also, you can initialize `d` with the same variables by using `d = dfSub[0,]` – Shique May 30 '18 at 11:54
  • rbind gives me one column with 6000 rows all together, but I need 24 columns separated – Tymo May 30 '18 at 12:00
  • 1
    Can you show us a small sample of `dfSub`? And maybe an ideal output as well. – Shique May 30 '18 at 12:02
  • @Tymo You can share output of `dput(head(dfSub,30))` as part of question. It will be easier for others to respond with that. Certainly, you dont need `for-loop` for such work. – MKR May 30 '18 at 12:09
  • I can do it via subset, but this code aimed to be used for further studies that includes not hourly but daily time resolutions. And for example if I want to analyze 150 days loop would be of better use I believe. – Tymo May 30 '18 at 12:21
  • @Tymo What do you want with `heat` value for 2 different days in same hour? – MKR May 30 '18 at 12:58
  • process them for finding hourly energy use with errorbars – Tymo May 30 '18 at 13:14

3 Answers3

2

One can avoid use of for-loop in this case. An option is to use tidyr::spread to convert your hourly data in wide format.

library(tidyverse)
df %>% select(-t, -wend) %>%
  mutate(hr = sprintf("hr%02d",hr)) %>%
  spread(hr, heat)

Result:

#   wk month dyid hr01 hr02 hr03 hr04 hr05 hr06 hr07 hr08 hr09 hr10 hr11 hr12 hr13 hr14 hr15 hr16 hr17 hr18 hr19 hr20 hr21 hr22 hr23 hr24
# 1  2     1    1   81   61   53   51   30   31   51   90  114  110  126  113  104  107  117   90  114   83   82   61   34   52   41   52
# 2  2     1    2   44   50   33   41   33   43   40    8   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA

Data:

df <- read.table(text = 
"hr wk month dyid wend     t heat
1  2     1    1    0  -9.00   81
2  2     1    1    0  -8.30   61
3  2     1    1    0  -7.80   53
4  2     1    1    0  -7.00   51
5  2     1    1    0  -7.00   30
6  2     1    1    0  -6.90   31
7  2     1    1    0  -7.10   51
8  2     1    1    0  -6.50   90
9  2     1    1    0  -8.90  114
10  2     1    1    0  -9.90  110
11  2     1    1    0 -11.70  126
12  2     1    1    0  -9.70  113
13  2     1    1    0 -11.60  104
14  2     1    1    0 -10.00  107
15  2     1    1    0 -10.20  117
16  2     1    1    0  -9.00   90
17  2     1    1    0  -8.00  114
18  2     1    1    0  -7.80   83
19  2     1    1    0  -8.10   82
20  2     1    1    0  -8.20   61
21  2     1    1    0  -8.80   34
22  2     1    1    0  -9.10   52
23  2     1    1    0 -10.10   41
24  2     1    1    0  -8.80   52
1  2     1    2    0  -8.70   44
2  2     1    2    0  -8.40   50
3  2     1    2    0  -8.10   33
4  2     1    2    0  -7.70   41
5  2     1    2    0  -7.80   33
6  2     1    2    0  -7.50   43
7  2     1    2    0  -7.30   40
8  2     1    2    0  -7.10   8",
header = TRUE, stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thanks for suggestion. It seems to be ok, however for me it was issue that R did not see "select" and after three re-installations of dplyr package if finally worked. But for spread I got error: Error: Duplicate identifiers for rows (1, 121, 241, 361, 481, 601, 721, 8. If someone will work with it, solution is here https://stackoverflow.com/questions/45898614/how-to-spread-columns-with-duplicate-identifiers?rq=1 For my task I found another option that is below. – Tymo May 31 '18 at 06:25
0

With tidyr:

> df<-read.fwf(textConnection(
+ "hr,wk,month,dyid,wend,t,heat
+     1  2     1    1    0  -9.00   81
+     2  2     1    1    0  -8.30   61
+     3  2     1    1    0  -7.80   53
+     4  2     1    1    0  -7.00   51
+     5  2     1    1    0  -7.00   30
+     6  2     1    1    0  -6.90   31
+     7  2     1    1    0  -7.10   51
+     8  2     1    1    0  -6.50   90
+     9  2     1    1    0  -8.90  114
+    10  2     1    1    0  -9.90  110  
+    11  2     1    1    0 -11.70  126
+    12  2     1    1    0  -9.70  113
+    13  2     1    1    0 -11.60  104
+    14  2     1    1    0 -10.00  107
+    15  2     1    1    0 -10.20  117
+    16  2     1    1    0  -9.00   90
+    17  2     1    1    0  -8.00  114
+    18  2     1    1    0  -7.80   83
+    19  2     1    1    0  -8.10   82
+    20  2     1    1    0  -8.20   61
+    21  2     1    1    0  -8.80   34
+    22  2     1    1    0  -9.10   52
+    23  2     1    1    0 -10.10   41
+    24  2     1    1    0  -8.80   52
+     1  2     1    2    0  -8.70   44
+     2  2     1    2    0  -8.40   50
+     3  2     1    2    0  -8.10   33
+     4  2     1    2    0  -7.70   41
+     5  2     1    2    0  -7.80   33
+     6  2     1    2    0  -7.50   43
+     7  2     1    2    0  -7.30   40
+     8  2     1    2    0  -7.10   8"
+ ),header=TRUE,sep=",",widths=c(5,3,6,5,5,7,5))
> 
> library(tidyr)
> df1 <- select(df,dyid,hr,heat)
> df2 <- spread(df1,hr,heat)
> colnames(df2)[2:ncol(df2)] <- paste0("hr",colnames(df2)[2:ncol(df2)])
> df2
  dyid hr1 hr2 hr3 hr4 hr5 hr6 hr7 hr8 hr9 hr10 hr11 hr12 hr13 hr14 hr15 hr16 hr17 hr18 hr19 hr20 hr21 hr22 hr23 hr24
1    1  81  61  53  51  30  31  51  90 114  110  126  113  104  107  117   90  114   83   82   61   34   52   41   52
2    2  44  50  33  41  33  43  40   8  NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA

>

Nicolas2
  • 2,170
  • 1
  • 6
  • 15
0

I found solution that helped me to solve my task here: Append data frames together in a for loop

by using empty list and combining later on in data frame

datalist = list()
for (i in 1:24) {
  subh= subset(dfSub$heat, dfSub$hr == i)
  datalist[[i]] = subh
}
big_data = do.call(rbind, datalist)

both cbind and rbind work. Thanks everyone for help :)

Tymo
  • 259
  • 6
  • 14