We have brands data in multiple columns with column names like "ID","Gender","Race","Country","VAR01","VAR02","VAR04","VAR05","VAR06","VAR08","VAR09","VAR13","VAR14","VAR15","VAR18","VAR19". Our task is to concatenate those brands columns data to a single column separated by a delimiter (;) which we were able to do with the following code.
R Code:
dataset<-sapply(dataset,as.character)
# Replace "NA" with blank
dataset[is.na(dataset)] <- ""
dataset<-as.data.frame(dataset)
#Concatenate columns data with semicolon(;)
dataset$Var_All<-paste(dataset$Var01,";",dataset$Var02,";",dataset$Var04,";",dataset$Var05,";",dataset$Var06,";",dataset$Var08,";",dataset$Var09,";",dataset$Var13,";",dataset$Var14,";",dataset$Var15,";",dataset$Var18,";",dataset$Var19)
#Remove blank spaces befor and after semicolon
dataset$Var_All <- gsub(" ; ", ";", dataset$Var_All)
dataset$Var_All <- gsub("; ", ";", dataset$Var_All)
dataset$Var_All <- gsub(" ;", ";", dataset$Var_All)
# Replace multiple semicolons with one semicolon step by step
dataset$Var_All <- gsub(";;;;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;", ";", dataset$Var_All)
#Remove beginning and ending semicolons if any
dataset$Var_All<-gsub("^;+|;+$", "", dataset$Var_All)
Data set:
ID Gender Race Country VAR01 VAR02 VAR04 VAR05 VAR06 VAR08 VAR09 VAR13 VAR14
1 Male Indian India Brand1 NA Brand2 NA Brand3 NA NA NA Brand4
2 Female Indian India NA NA NA NA NA NA NA NA NA
3 Male Indian India NA Brand2 NA Brand3 NA NA Brand5 NA NA
4 Male Indian India Brand1 NA NA NA Brand3 NA NA NA Brand4
5 Female Indian India NA Brand2 NA Brand4 NA Brand6 NA Brand7 NA
However my question is: Is there any way to pull the substring of the variables names i.e. VAR and concatenate those variables as one variable "VAR_All". Currently we are writing manually in paste() function and removing spaces, replacing multiple semicolons to get the desired output.
We would like to know that is there any possibility to write the code in a dynamic way so that it should calculate the number of variables which are starting with "VAR" and concatenate automatically into "VAR_All".
Desired output should be as follows:
ID VAR_All
1 Brand1;Brand2;Brand3;Brand4
2
3 Brand2;Brand3;Brand5
4 Brand1;Brand3;Brand4
5 Brand2;Brand4;Brand6;Brand7
Thanks for the help in advance.