0

I'm an R newbie. I exported data from a database and am trying to rename the columns.

Example existing names (one site per water quality parameter) are in quotes below. There are the 6 possible parameters at each site and 40 sites; I would like to rename columns based on parameter and site. Site names are 3-7 characters and always occur after the last decimal point. My dataset (AQexport1) has 240 columns and 47,714 rows (rows are time stamps of hourly continuous data). I want to be able to use the code for other exports from this database with same format and parameters, but possibly different sites.

For example:

  1. "Water.Temp.Water.Temp.BUBU" | "Water.Temp.Temperature.BUBU" <--- Temp.BUBU
  2. "Water.Temp.Field.Visits.KNF_DUP" <--- FVTemp.KNF_DUP
  3. "Sp.Cond.TempCorrected_nodrift.LOD_DUP" <---SpCnd.LOD_Dup
  4. "Sp.Cond.TempCorrected.PFM" <--- SpC.PFM
  5. "Sp.Cond.Field.Visits.CC7" <-- FVSpC.CC7
  6. "Cond.Conductivity.TM02Dup"<-- Cond.TM02Dup

I can not figure out how to write the contains() in an if statement (I realize the syntax below is wrong, I'm just trying to show what I'm thinking), or how to extract characters from a string with multiple decimal points and that does not extract the same number of characters from the end of the column name. I am also wondering if a for loop through colnames() is the best solution.

for (i in 1:colnames(AQexport1)){
  if (colnames(AQexport1[i]) contains "Water.Temp.W" | "Water.Temp.T"){
    colnames(AQexport1[i]) <- Temp.insert_site_name_here
  } 
    elseif (colnames(AQexport1[i])) contains "Water.Temp.F") {
      colnames(AQexport1[i]) <- FVTemp.insert_site_name_here
    }
    elseif (colnames(AQexport1[i])) contains "nodrift") {
      colnames(AQexport1[i])<-SpCnd.insert_site_name_here
    }
    elseif (colnames((AQexport1[i])) contains "Sp.Cond.T") {
      colnames(AQexport1[i])<-SpC.insert_site_name_here
    }
#continue elseif statements
} 
M--
  • 25,431
  • 8
  • 61
  • 93
  • 1
    Please consider referring to [this discussion](http://stackoverflow.com/q/5963269/1655567) and making your post reproducible. I think you are looking for **`%in%`** as in: `c("a", "b") %in% c("a", 45)`. Your code does not appear to be syntactically correct, `contains` provided like that would refer to an object `contains`, there also other weird things, what is `AQexport1`,`AQexport1[i]` makes sense if `AQexport1` is a vector but I, kind of think, in the context of your code `AQexport1` corresponds to some data? – Konrad Mar 07 '17 at 17:09
  • In the context of renaming columns you could: `names(mtcars)[names(mtcars) %in% c("mpg", "cyl")] <- c("renameThisColumn")` – Konrad Mar 07 '17 at 17:17
  • Thank you Konrad for the link. I will certainly reference that in future posts. I knew colnames produced a vector that I could index, but as you pointed out, I mistakenly thought that I could replace the value in that vector to change the column names in my dataframe. – Ecologist711 Mar 07 '17 at 19:02
  • Your solution to renaming works. However, with 240 unique column names to change and column names that could change depending on what I export from my database, I was hoping for a solution where I wouldn't need to type every name out. If I use the substring() function to create newnames, Ryan's solution should work. – Ecologist711 Mar 07 '17 at 19:08

2 Answers2

0

As a simplified case, this is how you would reassign names according to whether they contain letters 'a' or 'b', or contain 'c' or 'd', et cetera.

dat <- data.frame(aa = runif(10), ba = runif(10), ct = runif(10), df = runif(10), en = runif(10), zz = runif(10))

names(dat)
##  "aa" "ba" "ct" "df" "en" "zz"

library(magrittr)

# logical, TRUE if names contain a or b
condition1 <- sapply(c('a','b'), grepl, names(dat)) %>% rowSums %>% as.logical

# logical, TRUE if names contain c or d
condition2 <- sapply(c('c','d'), grepl, names(dat)) %>% rowSums %>% as.logical

# logical, TRUE if names contain e
condition3 <- grepl('e', names(dat))



names(dat) <- ifelse(condition1, 'newname1',
              ifelse(condition2, 'newname2',
              ifelse(condition3, 'newname3', 
                                 'newname4')))

names(dat)
## [1] "newname1" "newname1" "newname2" "newname2" "newname3" "newname4"
Ryan
  • 934
  • 8
  • 14
  • Thank you, very helpful! I forget that logicals can be converted to and from 1's or 0s. And, I will use substring() to make the correct newnames. – Ecologist711 Mar 07 '17 at 19:00
0

AQexport 1 is my dataset that needs new column names. This was my solution.

# 6 parameters with 11 syntax's to change column names for (each syntax could have up to 40 variations (.site names):
# "Water.Temp.Water.Temp.BUBU" or 
# "Water.Temp.Temperature.BUBU" or
# "Water.Temp.Temp.BUBU"<--- Temp.BUBU
# "Water.Temp.Field.Visits.KNF_DUP" <--- FVTemp.KNF_DUP
# "Sp.Cond.Sp.Cond.TempCorrected_nodrift.DRBR" or 
# "Sp.Cond.Sp.Cond..TempCorrected_nodrift.BRCD" or 
# "Sp.Cond.SpCond.nodrift.TM01" or
# "Sp.Cond.TempCorrected_nodrift.LOD_DUP" <---SpCnd.site
# "Sp.Cond.TempCorrected.PFM" <--- SpC.PFM
# "Sp.Cond.Field.Visits.CC7" <-- FVSpC.CC7
# "Cond.Conductivity.TM02Dup"<-- Cond.TM02Dup
# nchar("Sp.Cond.TempCorrected_nodrift.DRBR") #number of characters in string

#logical vectors, TRUE if AQexport1 column names match charcter input in grepl
Names<-names(AQexport1)
Temp.siteW<-grepl("Water.Temp.Water.Temp", Names)
Temp.siteT<-grepl("Water.Temp.Temperature", Names)
Temp.siteTT<-grepl("Water.Temp.Temp", Names)
FVTemp.site<-grepl("Temp.Field", Names)
FVSpC.site<-grepl("Sp.Cond.Field", Names)
SpCnd.site1<-grepl("Sp.Cond.Sp.Cond.TempCorrected_nodrift.", Names)
SpCnd.site2<-grepl("Sp.Cond.Sp.Cond..TempCorrected_nodrift..", Names)
SpCnd.site3<-grepl("Sp.Cond.SpCond.nodrift.", Names)
SpCnd.site4<-grepl("Sp.Cond.TempCorrected_nodrift.", Names)
Cond.site<-grepl("Cond.C", Names)
SpC.site<-grepl("Sp.Cond.TempCorrected.", Names) #Sp.Cond.TempCorrected.  the last period is critical to distinguish from nodrift

#Creating new column name: Paste parameter prefix to site name (extracted  from old column name with substring)
#Making parameter always 4 characters
Temp.namesW<-paste("Temp.",substring(Names,23), sep="")
Temp.namesT<-paste("Temp.",substring(Names,24), sep="")
Temp.namesTT<-paste("Temp.",substring(Names,17), sep="")
FVTemp.names<-paste("FTem.",substring(Names,25), sep="")
FVSpC.names<-paste("FSpC.",substring(Names,22), sep="")
SpCnd.names1<-paste("SCnd.",substring(Names,39), sep="")
SpCnd.names2<-paste("SCnd.",substring(Names,40), sep="")
SpCnd.names3<-paste("SCnd.",substring(Names,24), sep="")
SpCnd.names4<-paste("SCnd.",substring(Names,31), sep="")
Cond.names<-paste("Cond.",substring(Names,19), sep="")
SpC.names<-paste("SpCc.",substring(Names,23), sep="")

for (i in 1:ncol(AQexport1)){
  if(Temp.siteW[i]){
    names(AQexport1)[i]<-Temp.namesW[i]
  }
  else if(Temp.siteT[i]){
  names(AQexport1)[i]<-Temp.namesT[i]
  }
  else if(Temp.siteTT[i]){
    names(AQexport1)[i]<-Temp.namesTT[i]
  }
  else if(FVTemp.site[i]){
  names(AQexport1)[i]<-FVTemp.names[i]
  }
  else if(FVSpC.site[i]){
    names(AQexport1)[i]<-FVSpC.names[i]
  }
  else if(SpCnd.site1[i]){
    names(AQexport1)[i]<-SpCnd.names1[i]
  }
  else if(SpCnd.site2[i]){
    names(AQexport1)[i]<-SpCnd.names2[i]
  }
  else if(SpCnd.site3[i]){
    names(AQexport1)[i]<-SpCnd.names3[i]
  }
  else if(SpCnd.site4[i]){
    names(AQexport1)[i]<-SpCnd.names4[i]
  }
  else if(Cond.site[i]){
    names(AQexport1)[i]<-Cond.names[i]
  }
  else if(SpC.site[i]){
    names(AQexport1)[i]<-SpC.names[i]
  }
}