0

In my work, I need to assign a score to a new column. The numeric value of this score is species specific.

Currently I have the following method to achieve this which works but is not very concise for repeated use with multiple data sets:

bird$VIS <- 0 # creates the new column and populates it with 0 

bird$VIS[bird$species == "Tyto alba" ] <- 0.0502 # assigns this score to the VIS column for rows   where the species is "Tyto alba" 
bird$VIS[bird$species == "Branta leucopsis" ] <- 0.044 
bird$VIS[bird$species == "Ciconia nigra" ] <- 0.002
bird$VIS[bird$species == "Grus grus" ] <- 0.001
bird$VIS[bird$species == "Bubo bubo" ] <- 0.004513 
bird$VIS[bird$species == "Neophron percnopterus" ] <- 0.0015333
bird$VIS[bird$species == "Platalea leucorodia" ] <- 0.001

And so forth, there are 26 species in total but this subsample should be sufficient to demonstrate what I am trying to do.

My question is essentially how I turn this into a function that will work regardless of whether all the species are present in the data frame or not?

Essentially, instead of using the above sequential row assignments I would like to be able to write something like:

assign_VIS_function(bird)

resulting in an output something like:

SPECIES           VIS
Branta leucopsis  0.044
Tyto alba         0.0502
Tyto alba         0.0502
Tyto alba         0.0502
Tyto alba         0.0502
Gyps fulvus       0.22838
Gyps fulvus       0.22838
Gyps fulvus       0.22838

and so forth......

Many thanks.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
JGauld
  • 13
  • 2
  • Have you tried ifelse() logic? – Schilker Oct 08 '19 at 13:56
  • Create a vector with the mapping. Something like `vec<-c(Tyto alba=0.502, Branta leucopsis=0.044, ...)`. Then just try `vec[bird$species]`. – nicola Oct 08 '19 at 13:57
  • 2
    Create a lookup table - a data frame with a column for `species` and a column for `VIS`. You can save it as a CSV file, view it to verify the values are what you want, edit it as necessary, etc. When you want to add the column to a data frame, you `merge` the the two data frames, `merge(your_data, your_lookup_table)` – Gregor Thomas Oct 08 '19 at 13:58
  • Use `case_when` from `dplyr` – slava-kohut Oct 08 '19 at 13:58

3 Answers3

1

As @Gregor mentions with SQL-speak, save indicator data in a lookup table and then merge to original table in a one-to-many relationship which scales to 26 or 260 items:

species_vis_df <- data.frame(species = c("Tyto alba", "Branta leucopsis", "Ciconia nigra", 
                                         "Grus grus", "Bubo bubo", "Neophron percnopterus", 
                                         "Platalea leucorodia"),
                             value = c(0.0502 , 0.044, 0.002, 0.001, 
                                       0.004513, 0.0015333, 0.001))

Alternatively. in tabular format:

txt = 'species                 value
"Tyto alba"                   0.0502
"Branta leucopsis"             0.044
"Ciconia nigra"                0.002
"Grus grus"                    0.001
"Bubo bubo"                 0.004513
"Neophron percnopterus"    0.0015333
"Platalea leucorodia"          0.001'

species_vis_df <- read.table(text = txt, header=TRUE)
species_vis_df
#                 species     value
# 1             Tyto alba 0.0502000
# 2      Branta leucopsis 0.0440000
# 3         Ciconia nigra 0.0020000
# 4             Grus grus 0.0010000
# 5             Bubo bubo 0.0045130
# 6 Neophron percnopterus 0.0015333
# 7   Platalea leucorodia 0.0010000

Then run merge, specifically to borrow SQL again a left join merge with all.x=TRUE to keep all original rows regardless of matches with second table. Afterwards, make needed assignment (NA values for non-matches) and remove lookup value:

bird <- within(merge(bird, species_vis_df, by="species", all.x=TRUE), {
               VIS <- value
               rm(value)
        })
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

You could have actually proposed a very simple reproducible example which I give here :

DT <- data.frame(V1 = LETTERS[1:10])

You want give score for each specific variable of V1 on a new variable (VIS).

a dplyr solution with case_when

library(dplyr)
DT = DT %>% 
  mutate(VIS=case_when(
    V1=="A"~0.1,
    V1=="B"~0.2 #and so on
  ))

DT
Félix Cuneo
  • 159
  • 6
-1

Another example using ifelse() logic. You see, if you don't code for the "Platalea leucorodia" or what other species they will be attributed a 0 (which is at the very end of the code).

data<-data_frame(birds=c("Grus grus","Bubo bubo","Grus grus","Bubo bubo","Platalea 
leucorodia"))
data %>%
  mutate(VIS = ifelse(birds == "Tyto alba", 0.0502 ,
                     ifelse(birds == "Branta leucopsis" ,  0.044 ,
                            ifelse(birds == "Ciconia nigra" , 0.002,
                                   ifelse(birds == "Grus grus", 0.001, 
                                          ifelse(birds == "Bubo bubo", 0.004513 , 0))))))
Schilker
  • 505
  • 2
  • 11
  • This is less efficient than the code in OP's question, and IMHO less readable as well. Nested ifelse is awful to write, especially as OP says they have 26 conditions. – Gregor Thomas Oct 08 '19 at 14:12
  • @Gregor Just providing another perspective – Schilker Oct 08 '19 at 14:14
  • But why provide a bad alternative without any commentary about it being bad? Just to trick people without enough experience to recognize it as being bad into using it? – Gregor Thomas Oct 08 '19 at 14:15
  • Bad alternative? It is not bad, its just not as clean. But will get the job done correctly. – Schilker Oct 08 '19 at 14:16
  • It is bad. Write it out for OPs 26 conditions, and then tell me it's not bad. It's more work to write, harder to read, worse to debug, computationally slower, and it doesn't scale. (If OPs problem scales to more than 50 conditions, [the method fails completely](https://stackoverflow.com/a/25063770/903061)). Being worse in *every way* than the option OP already has in the question makes it a bad suggestion. – Gregor Thomas Oct 08 '19 at 14:20