classifying multiple observations within one variable so then i can categorise them in new column . how can i make the code shorter? in R

Question

i'm looking for a shortcut or less labour intensive way of grouping certain observations within the same variable, then output in a new column depending.

axa$type[axa$instrument_type == "CORPORATE BONDS" | axa$instrument_type == "GOVERNMENT BONDS"] <- 'BONDS'
axa$type[axa$instrument_type == "FOREIGN CURRENCY"] <- 'Cash'
axa$type[axa$instrument_type == "FUT-FIXED INCOME"] <- 'Derivatives'
axa$type[axa$instrument_type  ==  "INTEREST RATE SWAP"] <- 'Derivatives'
axa$type[axa$instrument_type == "MUTUAL FUNDS"] <- 'Funds'
axa$type[axa$instrument_type == "SHORT TERMS"] <- 'Cash Equivalent'
axa$type[axa$instrument_type == "CMO"] <- 'Other Fi'
axa$type[axa$instrument_type == "NON-SECY ASSET STOCK"] <- 'Other'

the code searches for certain observations,then will output in column axa$type, with the desired output: "cash", "derivatives".

is there any way of making this code shorter/compact. preferably using the Data table Package

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Jun 02 '20 at 22:14
btw, its `data.table` with a dot and not `datatable`. the latter usually refers to the `DT` package — chinsoon12, Jun 02 '20 at 22:42

score 1 · Accepted Answer · answered Jun 02 '20 at 22:15

1

An easier option is to create a key/value dataset pair and then do a join. This is extensible and it requires only a single join instead of doing the == multiple times and assignments

library(data.table)
keydat <- data.table(instrument_type = c("CORPORATE_BONDS", "FOREIGN_CURRENCY",
    ...), type = c("GOVERNMENT", "Cash",...))

setDT(axa)[keydat, type := i.type, on = .(instrument_type)]

NOTE: ... is the remaining values in 'instrument_type' and the corresponding 'type' values

answered Jun 02 '20 at 22:15

akrun

874,273
37
540
662

1

sorry i don't really understand this. how would it understand what i want it to assign to and how does it cover multiple observations for one output? – Tykid15 Jun 02 '20 at 22:32
@Tykid15. You have to create the values for each key ie. `c(""FUT-FIXED INCOME", "INTEREST RATE SWAP")` and `c("DERIVATIVES", "DERIVATIVES") i.e. each row for each of the elements you want to replace. The idea is that key is unique while value can be same – akrun Jun 02 '20 at 22:33

score 0 · Answer 2 · answered Jun 03 '20 at 04:42

Not really shorter but using case_when from dplyr would make it cleaner and avoids writing dataframe_name$column_name every time. You can use %in% instead of | to compare multiple values in instrument_type.

library(dplyr)

axa %>%
   mutate(type = case_when(
      instrument_type %in% c("CORPORATE BONDS","GOVERNMENT BONDS") ~ "BONDS", 
      instrument_type == "FOREIGN CURRENCY" ~ "Cash", 
      instrument_type %in% c("FUT-FIXED INCOME", "INTEREST RATE SWAP") ~ "Derivatives", 
      instrument_type == "MUTUAL FUNDS"~"Funds", 
      instrument_type == "SHORT TERMS" ~ "Cash Equivalent", 
      instrument_type == "CMO" ~"Other Fi", 
      instrument_type == "NON-SECY ASSET STOCK" ~"Other"))

If interested in data.table solution similar to case_when there is fcase in data.table which is available in development version of data.table.

it works, but it does not create a column on original dataset. still without the type column when i load it. do you know why that could potentially happen? — Tykid15, Jun 03 '20 at 11:26
@Tykid15 Did you assign the results back? `axa <- axa %>% mutate(type = case_when(....` — Ronak Shah, Jun 03 '20 at 11:26

classifying multiple observations within one variable so then i can categorise them in new column . how can i make the code shorter? in R

2 Answers2