R calculation of new column value from other columns, using relative positions. Why does the class of the resulting column = data.frame?

Question

Each day I run summary report to get a list of the files that were imported and used in a certain project output. Each day's log is stamped with current day's date (Log.Date) and they are read into a running file log.

I want to create a check for this log that will let me know that the same number of historical files are being read in from one day to the next. (i.e. today's imports should equal yesterday's imports, plus a few new files from today.)

Example log:

fileLog <- data.frame('Log.Date'=c('2020-08-01','2020-08-01','2020-08-02','2020-08-02','2020-08-02','2020-08-03','2020-08-03','2020-08-03','2020-08-03'),
                     'System' = c('A','B','A','B','C','A','B','C','D'),
                     'File'=c('file1','file2','file1','file2','file3', 'file1', 'file2','file3','file4')) 

#     Log.Date     System   File
# 1   2020-08-01      A     file1
# 2   2020-08-01      B     file2
# 3   2020-08-02      A     file1
# 4   2020-08-02      B     file2
# 5   2020-08-02      C     file3
# 6   2020-08-03      A     file1
# 7   2020-08-03      B     file2
# 8   2020-08-03      C     file3
# 9   2020-08-03      D     file4

I group the file log by the Log.Date and System to get a file count for each day, then pivot the data so that I can see and calculate any change from the previous day. I want to create the calculation by position: Change = 'last column' minus 'next to last column', since the column names will always be changing.

fileLog <- fileLog %>% 
  arrange(Log.Date) %>% 
  group_by(Log.Date, System) %>% 
  summarise(File.Count = length(unique(File))) %>% 
  ungroup() %>% 
  pivot_wider(names_from = Log.Date, values_from = File.Count) %>% 
  replace(is.na(.), 0)
fileLog <- as.data.frame(fileLog) %>% 
  mutate(Change = rev(fileLog)[1] - rev(fileLog)[2])

This does let me view the change I'm looking for, however the resulting 'Change' column is of class 'data.frame'. This causes me problems for exporting it as part of a report.

#        System  2020-08-01  2020-08-02  2020-08-03  Change.2020-08-03
# 1        A          1          1          1          0
# 2        B          1          1          1          0
# 3        C          0          1          1          0
# 4        D          0          0          1          1

> class(fileLog$Change)
[1] "data.frame"

Using the absolute column names works fine:

mutate(Change = fileLog$'2020-08-03' - fileLog$'2020-08-02')

I've tried other iterations of calling the position, which nets the same problem

mutate(Change = fileLog[,ncol(fileLog)] - fileLog[,ncol(fileLog)-1])

I've also tried coercing the column to numeric, gives the error: 'list' object cannot be coerced to type 'double'

  mutate(Change = as.numeric(check_start[,-1]) - as.numeric(check_start[,-2]))
  mutate(Change = as.numeric(rev(check_start)[1]) - as.numeric(rev(check_start)[2]))

Question: Can someone suggest the correct way to make this positional calculation to output a 'Change' column that is not of class=data.frame?

There might also be value in considering whether this can be done in the long format. Using `lag` or `diff` or similar functions can easily deal with day-to-day changes, and you can do this within groups using `group_by` as well. — thelatemail, Aug 27 '20 at 00:43
Also, this ancient question from the decade before last may be helpful for understanding all of the indexing differences with `[`, `[[`, `$` etc - https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el — thelatemail, Aug 27 '20 at 00:49
Thank you! I'll look into the lag and diff, as I can see where this could be useful. Otherwise, the post about [, [[, $ was SUPER helpful. — gkoo, Aug 27 '20 at 13:25

Ronak Shah · Accepted Answer · 2020-08-27T00:52:50.530

Few changes to your previous step :

Instead of length(unique(File)), used n_distinct
Instead of adding another replace step after pivot_wider use values_fill.

fileLog <- fileLog %>% 
  arrange(Log.Date) %>% 
  group_by(Log.Date, System) %>% 
  summarise(File.Count = n_distinct(File)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = Log.Date, values_from = File.Count, values_fill = 0)

Now to subtract value from last and second last column you can use ncol which will give you column number of last column.

fileLog <- fileLog %>% mutate(Change = .[[ncol(.)]] - .[[ncol(.) - 1]]) 
fileLog
# A tibble: 4 x 5
  System `2020-08-01` `2020-08-02` `2020-08-03` Change
  <chr>         <int>        <int>        <int>  <int>
1 A                 1            1            1      0
2 B                 1            1            1      0
3 C                 0            1            1      0
4 D                 0            0            1      1

Shouldn't it be `[[ncol(.)]]` etc with double brackets? – thelatemail Aug 27 '20 at 00:38 — thelatemail, Aug 27 '20 at 00:38
Yes, you are right. `[` still keeps it as a dataframe. – Ronak Shah Aug 27 '20 at 00:41 — Ronak Shah, Aug 27 '20 at 00:41
This of course cleared it right up. Thank you so much!! – gkoo Aug 27 '20 at 13:28 — gkoo, Aug 27 '20 at 13:28

R calculation of new column value from other columns, using relative positions. Why does the class of the resulting column = data.frame?

1 Answers1