0

How can I remove a specific number of characters from a column name from 200+ column names for example: "Q1: GOING OUT?" and "Q5: STATE, PROVINCE, COUNTY, ETC" I just want to remove the "Q1: " and the "Q5: "I have looked around but haven't been able to find one where I don't have to manually rename them manually. Are there any functions or ways to use it through tidyverse? I have only been starting with R for 2 months.

I don't really have anything to show. I have considered using for loops and possibly using gsub or case_when, but don't really understand how to properly use them.

#probably not correctly written but tried to do it anyways

for ( x in x(0:length) and _:(length(CandyData)-1){
  front -> substring(0:3)
  back -> substring(4:length(CandyData))
  print <- back
}

I don't really have any errors because I haven't been able to make it work properly.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Felix Chan
  • 21
  • 4
  • Do you have 200 _columns_, or a single column with 200 _rows_ ? – Tim Biegeleisen Jun 15 '19 at 07:35
  • 1
    If these are indeed (odd) column names you can use something like: `names(df) <- sub("^Q\\d+: ", "", names(df))`. – Ritchie Sacramento Jun 15 '19 at 07:38
  • @TimBiegeleisen i have 200 columns, and 2.5k rows – Felix Chan Jun 15 '19 at 07:43
  • @H1 what does the "\\d+:" mean? im not very familiar with using grep – Felix Chan Jun 15 '19 at 07:44
  • @FelixChan It's regex and that part means match one or more digits followed by a colon. – Ritchie Sacramento Jun 15 '19 at 07:47
  • 1
    Please provide sample data with `dput(head(CandyData))` or make up some dummy data that can be used. – NelsonGon Jun 15 '19 at 08:29
  • @H1 would it look like? would it look something like this? CandyData <- sub(^Q\\d+: ", "", CandyData) – Felix Chan Jun 15 '19 at 22:23
  • @FelixChan - Look carefully and you'll see that your code does not match the example I gave. Your code performs the action on the entire dataframe rather than only the column names. If you continue to have problems, please post a sample of your data as suggested above. – Ritchie Sacramento Jun 16 '19 at 09:34
  • @H1 I see the problem now. Thank you so much, the code you showed is deleting "Q__: " however, when i am trying to delete a column named "Q6 | 100 Grand Bar" and change the code to this: "names(Candy_Hierarchy) <- sub("^Q\\d+|", "", names(Candy_Hierarchy))" it does not seem to delete the "|" from it as well and just releases this "| 100 Grand Bar" do you know why that is? – Felix Chan Jun 16 '19 at 10:56
  • See https://stackoverflow.com/questions/27721008/how-do-i-deal-with-special-characters-like-in-my-regex – Ritchie Sacramento Jun 16 '19 at 11:08
  • @NelsonGon Hi Im very new to stackoverflow how can i do that? – Felix Chan Jun 17 '19 at 04:11

2 Answers2

0

Try this:

    col_all<-c("Q1:GOING OUT?","Q2:STATE","Q100:PROVINCE","Q200:COUNTRY","Q299:ID") #This is an example.If you already have a dataframe ,you may get colnames by **col_all<-names(df)**

    for(col in 1:length(col_all))              # Iterate over the col_all list
    {           
        colname=col_all[col]                   # assign each column name to variable colname at each iteration
        match=gregexpr(pattern =':',colname)   # Find index of : for each colname(Since you want to delete characters before colon and keep the string succeeding :
        index1=as.numeric(match[1])            # only first element is needed for index
        if(index1>0)
        {
            col_all[col]=substr(colname,index1+1,nchar(colname))#Take substring after : for each column name and assign it to col_all list
        }        

    }

    names(df)<-col_all                  #assign list as column name of dataframe
  • I am getting an error "Error in as.numeric(match[1]) : (list) object cannot be coerced to type 'double' " How can i fix this? – Felix Chan Jun 16 '19 at 01:34
  • Hi Felix. It might be because your column name must be having multiple colon, use this instead index1=as.numeric(unlist(match[1])[1]) – RIMIL HEMBROM Jun 16 '19 at 03:51
  • HI Rimil, I seem to still get an error when i try to use the code. "Error in gregexpr(pattern = ":", colname) : object 'colname' not found" When I change the "colname = col_all[col]" to "colname <- col_all[col]" i receive this error: "the condition has length > 1 and only the first element will be usedError in if (type == "package") package <- topic : missing value where TRUE/FALSE needed" what can i do to fix this? – Felix Chan Jun 16 '19 at 08:10
0

The H 1 answer is still the best: sub() or gsub() functions will do the work. And do not fear the regex, it is a powerful tool in data management.

Here is the gsub version:

names(df) <- gsub("^.*:","",names(df))

It works this way: for each name, fetch characters until reaching ":" and then, remove all the fetched characters (including ":").

Remember to up vote H 1 soluce in the comments

Elie Ker Arno
  • 346
  • 1
  • 11
  • would it look like this? CandyData <- gsub("^.*:","",CandyData) where CandyData is the df? – Felix Chan Jun 15 '19 at 22:07
  • Ive tried using this as CandyData <- gsub("^.*:","",CandyData) and H 1's sub("^Q\\d+: ", "", CandyData), however it would turn my dataframe into values. How would i go to fix that? – Felix Chan Jun 16 '19 at 01:31
  • hmm .. Maybe try to replace `names(...)` by `colnames(...)` or `col.names(...)` – Elie Ker Arno Jun 16 '19 at 16:32
  • I found that gsub didn't work as well and have just used H1's sub-method to do it. there was a problem with special characters but I just did it separately and it seemed to work and solve my problem. Thank you so much for your help! – Felix Chan Jun 17 '19 at 04:11