0

I have a df that are in different unit. I would like to convert them into the same unit based on the conversion factors in cov. When df$Test==cov$Type, df$Unit==cov$Raw, then we do the calculation. if df$unit can not be found in cov, then keep as is and flag it with new variable "Check"=="Y"

What will be the best way to complete this conversion process. I saw someone using the method: building an empty df then read in each record one by one with calculation. Is it a good way? or it is the way counts as more careful way? I would like to know what will you do if you are the person to handle such task. As many as possible. Many thanks.

df<-structure(list(Test = c("Length", "Weight", "Weight", "Weight", 
"Weight", "Length", "Length", "Length", "Length", "Length", "Length", 
"Length", "Length", "Length"), Result = c(4.5, 36, 147, 55, 175, 
2, 125, 222, 1.6, 3, 56, 512, 28, 78), Unit = c("m", "lb", "g", 
"kg", "oz", "cm", "in", "mm", "ft", "m", "in", "cm", "cm", NA
)), row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))

cov<- structure(list(Type = c("Length", "Length", "Length", "Length", 
"Length", "Weight", "Weight", "Weight", "Weight"), Raw = c("m", 
"cm", "mm", "in", "ft", "lb", "g", "kg", "oz"), Standard = c("cm", 
"cm", "cm", "cm", "cm", "g", "g", "g", "g"), Factor = c(100, 
1, 0.1, 2.54, 30.48, 453, 1, 1000, 28)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))
DS_UNI
  • 2,600
  • 2
  • 11
  • 22
Stataq
  • 2,237
  • 6
  • 14

2 Answers2

3

A sensible first step is to merge the two dataframes so that each line in df is added the suitable cov$Factor as in:

merge(df, cov, by.x = "Unit", by.y = "Raw", all.x = TRUE, all.y = FALSE)

which gives:

   Unit   Test Result   Type Standard  Factor
1    cm Length  512.0 Length       cm    1.00
2    cm Length   28.0 Length       cm    1.00
3    cm Length    2.0 Length       cm    1.00
4    ft Length    1.6 Length       cm   30.48
5     g Weight  147.0 Weight        g    1.00
6    in Length  125.0 Length       cm    2.54
7    in Length   56.0 Length       cm    2.54
8    kg Weight   55.0 Weight        g 1000.00
9    lb Weight   36.0 Weight        g  453.00
10    m Length    4.5 Length       cm  100.00
11    m Length    3.0 Length       cm  100.00
12   mm Length  222.0 Length       cm    0.10
13   oz Weight  175.0 Weight        g   28.00
14 <NA> Length   78.0   <NA>     <NA>      NA

It is then easy to multiply Result by Factor to get results in unified units and run an ifelse to add a Check variable, should that be necessary even after each line has been checked.

Bernhard
  • 4,272
  • 1
  • 13
  • 23
  • Thanks. This could work. Any other way? I would like to know whether there is a way to do directly calculation without merge if that is possible. – Stataq Aug 13 '21 at 13:25
  • Any reason you try to avoid the `merge` and the `left_join` in an alternative answer? Reasons might help to decide what is ok for you and what not. You could always delete columns later, if they are in the way. – Bernhard Aug 13 '21 at 13:28
  • Haha. "merge" and "left-join" are both great. I just wondering why `building an empty df then read in each record one by one with calculation.` Maybe they do it in that way to avoid merge. If so, then whether there is a better way. I often see people building an empty df, then fill in the record one by one. I had hard time to follow that route. I wonder whether that is common way to process data? any better way to do that? – Stataq Aug 13 '21 at 13:37
1

Using dplyr you can do -

library(dplyr)

left_join(df, cov, by = c('Test' = 'Type', 'Unit' = 'Raw')) %>%
  mutate(final = Result * Factor, 
         Check = ifelse(is.na(final), 'Y', 'F'))

#   Test   Result Unit  Standard  Factor   final Check
#   <chr>   <dbl> <chr> <chr>      <dbl>   <dbl> <chr>
# 1 Length    4.5 m     cm        100      450   F    
# 2 Weight   36   lb    g         453    16308   F    
# 3 Weight  147   g     g           1      147   F    
# 4 Weight   55   kg    g        1000    55000   F    
# 5 Weight  175   oz    g          28     4900   F    
# 6 Length    2   cm    cm          1        2   F    
# 7 Length  125   in    cm          2.54   318.  F    
# 8 Length  222   mm    cm          0.1     22.2 F    
# 9 Length    1.6 ft    cm         30.5     48.8 F    
#10 Length    3   m     cm        100      300   F    
#11 Length   56   in    cm          2.54   142.  F    
#12 Length  512   cm    cm          1      512   F    
#13 Length   28   cm    cm          1       28   F    
#14 Length   78   NA    NA         NA       NA   Y    
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • what if I split df based on df$Test? Like you suggested in my another question, now I have `df1<- split(df, df$Test)` how should I handle it when it is a list? – Stataq Aug 13 '21 at 14:24
  • 1
    I would still suggest to combine them into one dataframe using `bind_rows(df1)` and proceed like in the answer. – Ronak Shah Aug 13 '21 at 14:33