1

I am trying to present another columns DF2$time_expected and DF$time/DF2$time_expected in the output of DF[DF$Experiment=="A", ]. Code where outputs of some commands shown

library(data.table)
library('dplyr')
ow <- options("warn")

DF <- read.csv(text= 
"Field,time,T,Experiment
Acute,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F", 
header=TRUE, sep=",")

# http://stackoverflow.com/a/43695774/54964
DF[DF$Experiment=="A", ]
#  Field time   T Experiment
#1 Acute  0.0   0          A
#2    An  9.0 120          A
#3    En 15.6   2          A
#4    Fo  9.2   2          A
# TODO integrate here relative values of DF$time/DF2$time_expected as another column

DF2 <- read.csv(text=
"Field,time_expected
Acute,6
An,6
En,6
Fo,5", 
header=TRUE, sep=",")
#DF2[DF2$Field=="An", ]

## Now compare DF.time to DF2.time_expected by DF.time/DF2.time_expected
DF$time/DF2$time_expected
# [1] 0.000000 1.500000 2.600000 1.840000 1.383333 1.283333 2.150000 0.000000
# [9] 1.250000 1.316667 0.000000 1.080000 1.433333 1.300000 0.000000 0.000000
#[17] 0.000000 1.316667 2.383333 0.000000 1.383333 1.366667 2.433333 1.580000

Expected output where two new columns (time_expected and time/time_expected)

  Field time   T Experiment  time_expected time/time_expected
1 Acute  0.0   0          A  6              0.0
2    An  9.0 120          A  6              1.5
3    En 15.6   2          A  6              2.6
4    Fo  9.2   2          A  5              1.84
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Léo Léopold Hertz 준영
  • 134,464
  • 179
  • 445
  • 697

2 Answers2

2

It's usually not a good idea to load both data.table and dplyr together.

With dplyr we can do:

library(dplyr) # No need to quote

final_df <- DF %>% 
  filter(time < 8) %>%
  full_join(DF2) %>% 
  mutate(`time/time_expected` = time / time_expected) %>% 
  select(Field, `time/time_expected`)

final_df
#> Joining, by = "Field"
#>    Field time  T Experiment time_expected time/time_expected
#> 1  Acute  0.0  0          A             6           0.000000
#> 2     An  7.7 26          B             6           1.283333
#> 3     Fo  0.0  0          B             5           0.000000
#> 4  Acute  7.5  1          C             6           1.250000
#> 5     An  7.9 43          C             6           1.316667
#> 6     En  0.0  0          C             6           0.000000
#> 7     Fo  5.4  1          C             5           1.080000
#> 8     An  7.8 77          D             6           1.300000
#> 9     En  0.0  0          D             6           0.000000
#> 10    Fo  0.0  0          D             5           0.000000
#> 11 Acute  0.0  0          E             6           0.000000
#> 12    An  7.9 60          E             6           1.316667
#> 13    Fo  0.0  0          E             5           0.000000
#> 14    Fo  7.9  3          F             5           1.580000
GGamba
  • 13,140
  • 3
  • 38
  • 47
1

We can do a join with data.table and create the new column by assigning (:=) the output of time/time_expected) to 'timeN'

setDT(DF)[DF2, time_expected := time_expected, on = .(Field)]
DF[, timeN := time/time_expected]
head(DF, 4)
#    Field time   T Experiment time_expected timeN
#1: Acute  0.0   0          A             6  0.00
#2:    An  9.0 120          A             6  1.50
#3:    En 15.6   2          A             6  2.60
#4:    Fo  9.2   2          A             5  1.84

Or it can be done within in the on

setDT(DF)[DF2, c('time_expected', 'timeN') := .(time_expected,time/time_expected), on = .(Field)]

DF
#   Field time   T Experiment time_expected    timeN
# 1: Acute  0.0   0          A             6 0.000000
# 2:    An  9.0 120          A             6 1.500000
# 3:    En 15.6   2          A             6 2.600000
# 4:    Fo  9.2   2          A             5 1.840000
# 5: Acute  8.3   1          B             6 1.383333
# 6:    An  7.7  26          B             6 1.283333
# 7:    En 12.9   1          B             6 2.150000
# 8:    Fo  0.0   0          B             5 0.000000
# 9: Acute  7.5   1          C             6 1.250000
#10:    An  7.9  43          C             6 1.316667
#11:    En  0.0   0          C             6 0.000000
#12:    Fo  5.4   1          C             5 1.080000
#13: Acute  8.6   2          D             6 1.433333
#14:    An  7.8  77          D             6 1.300000
#15:    En  0.0   0          D             6 0.000000
#16:    Fo  0.0   0          D             5 0.000000
#17: Acute  0.0   0          E             6 0.000000
#18:    An  7.9  60          E             6 1.316667
#19:    En 14.3   1          E             6 2.383333
#20:    Fo  0.0   0          E             5 0.000000
#21: Acute  8.3   4          F             6 1.383333
#22:    An  8.2 326          F             6 1.366667
#23:    En 14.6   4          F             6 2.433333
#24:    Fo  7.9   3          F             5 1.580000

subsetting the rows where 'time' is less than 8

DF[time < 8]
#    Field time  T Experiment time_expected    timeN
# 1: Acute  0.0  0          A             6 0.000000
# 2:    An  7.7 26          B             6 1.283333
# 3:    Fo  0.0  0          B             5 0.000000
# 4: Acute  7.5  1          C             6 1.250000
# 5:    An  7.9 43          C             6 1.316667
# 6:    En  0.0  0          C             6 0.000000
# 7:    Fo  5.4  1          C             5 1.080000
# 8:    An  7.8 77          D             6 1.300000
# 9:    En  0.0  0          D             6 0.000000
#10:    Fo  0.0  0          D             5 0.000000
#11: Acute  0.0  0          E             6 0.000000
#12:    An  7.9 60          E             6 1.316667
#13:    Fo  0.0  0          E             5 0.000000
#14:    Fo  7.9  3          F             5 1.580000
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    @LéoLéopoldHertz준영 Because the original dataset `DF` doesn't have that column only when we do the join, the output have. It is to create the column in 'DF' – akrun Apr 30 '17 at 12:34
  • @LéoLéopoldHertz준영 What is the data.table version you have? I am using `data.table_1.10.4` – akrun Apr 30 '17 at 12:42
  • 1
    @LéoLéopoldHertz준영 It is pretty old. Could you update your version and try – akrun Apr 30 '17 at 12:43
  • 1
    @LéoLéopoldHertz준영 I don't have any problem with that command. ` `DF[order(time)][time < 8] Field time T Experiment time_expected timeN 1: Acute 0.0 0 A 6 0.000000 2: Fo 0.0 0 B 5 0.000000...` Do you have a data.table object i.e. `setDT(DF)` ? – akrun Apr 30 '17 at 12:53
  • @LéoLéopoldHertz준영 I am getting the expected output based on your example you showed. Have you tried on a new session? – akrun Apr 30 '17 at 13:28
  • 1
    @LéoLéopoldHertz준영 You can use `DF[order(time)][time < 8, c("Field", "timeN")]` or `DF[order(time)][time < 8, .(Field, timeN)]` – akrun Apr 30 '17 at 13:40
  • @LéoLéopoldHertz준영 By default, column names will be printed on data.frame/data.table/tbl_df etc. – akrun May 01 '17 at 04:54
  • @LéoLéopoldHertz준영 Did you meant `setnames(DF[time < 8], rep("", ncol(DF)))[]` – akrun May 01 '17 at 04:55
  • @LéoLéopoldHertz준영 May be because you subsetted the columns. In that case, do it in two steps, to get the number of columns and specify the ncol ie. `rep("", n)` – akrun May 01 '17 at 04:59
  • @LéoLéopoldHertz준영 BTW, is it for writing into file? In that case, you have options to not print the column names – akrun May 01 '17 at 05:00
  • @LéoLéopoldHertz준영 If you are not okay with the NULL route, then you may have to create a custom function for `print` by modifying the existing `print` – akrun May 01 '17 at 05:05
  • @LéoLéopoldHertz준영 Can you post it as a new question as I couldn't find a link for it – akrun May 01 '17 at 05:07