0

I have a dataframe (d3) which has some column names with "Date_Month.Year", I want to replace those column names with just "Month.Year" so if there are multiple columns with the same "Month.Year" they will just be a summed column.

Below is the code I tried and the output

library(stringr)

print(colnames(d3))
 #below is output of the print statement
 #[1] "ProductCategoryDesc" "RegionDesc"          "SourceDesc"          "variable"           
 #[5] "2019-02-28_Feb.2019" "2019-03-01_Mar.2019" "2019-03-04_Mar.2019" "2019-03-05_Mar.2019"
 #[9] "2019-03-06_Mar.2019" "2019-03-07_Mar.2019" "2019-03-08_Mar.2019" 

d3 <- d3 %>% mutate(col = str_remove(col, '*._'))

Here is the error I get: Evaluation error: argument str should be a character vector (or an object coercible to).

So I got the first part of my problem answered I used to get all column names in Month.Year format but now I am having issues with summing the columns that have the same name, for that I looked at Sum and replace columns with same name R for a data frame containing different classes

colnames(d3) <- gsub('.*_', '', colnames(d3))

Below is the code I used to get the columns summed that have a duplicate name, however with this code it is not necessarily putting the summed values in the correct columns.

indx <- sapply(d3, is.numeric)#check which columns are numeric
nm1 <- which(indx)#get the numeric index of the column
indx2 <- duplicated(names(nm1))|duplicated(names(nm1),fromLast=TRUE)
nm2 <- nm1[indx2]
indx3 <- duplicated(names(nm2))
d3[nm2[!indx3]] <- Map(function(x,y) rowSums(x[y],na.rm = FALSE), 
                        list(d3),split(nm2, names(nm2)))
d3 <- d3[ -nm2[indx3]]
MLS
  • 108
  • 14
  • Maybe you need to think differently. You may want to `gather` all your columns into a column of names and a column of values. Then, `separate` or `extract` the names into a new column, and finally `group_by` + `summarise` your values. These functions are in {tidyr} and {dplyr}. With a reproducible example, it would be easier to show you an answer by the way. – Sébastien Rochette Mar 20 '19 at 21:49

2 Answers2

1

If you want to change the column names, you should be changing colnames:

colnames(d3) <- gsub('.*_', '', colnames(d3))

Note, in your regex, quantifiers (ie *) go after the thing they quantify. So it should be .*_ rather than *._

An example where we remove text before a . in iris:

colnames(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

# In regex, . means any character, so to match an actual '.',
#   we need to 'escape' it with \\.
colnames(iris) <- gsub('.*\\.', '', colnames(iris))

colnames(iris)
[1] "Length"  "Width"   "Length"  "Width"   "Species"
divibisan
  • 11,659
  • 11
  • 40
  • 58
  • Now from that how would I make duplicated column names just be one, so sum them, if you know? – MLS Mar 20 '19 at 17:34
  • If I understand correctly, you're looking for this question: [Sum all columns whose names start with a pattern, by group](https://stackoverflow.com/q/32052723/8366499). If you have questions about doing that which the linked question doesn't solve, please ask another question about that problem. We like to keep each question focused on one problem. – divibisan Mar 20 '19 at 17:37
0
colnames(d3) <- sapply(colnames(d3), function(colname){
    return( str_remove(colname, '.*_') )
})

The regex should be ".*_" to match the case you need

  • I get this error Warning: Error in stri_replace_first_regex: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX). Maybe there is an issue since some columns would have the same name and those column values have to be summed? – MLS Mar 20 '19 at 17:04
  • 1
    .*_ should be the regex string, you mistyped dot :) – Krzysztof Nowicki Mar 20 '19 at 17:11