1

Let's assume I have a following data frame:

xx2xx30x4xx <- rep(5,30)
yyyy3yy50y5yyy <- rep(4,30)
zz12zzzz70z8zz <- rep(7,30)
df <- data.frame(xx2xx30x4xx,yyyy3yy50y5yyy,zz12zzzz70z8zz)

I would like to rename column names, so that they would consist of only the biggest number in between. I thought of doing it with gsub/grep and a loop, for example: This returns me the column names

grep(pattern = "[50-100]", x = colnames(df), value= T )

Now, I would want the column name to be equal to the pattern, by which they were matched, which is the number from 50-100 and not smaller numbers. Is this possible? If not, do you know other generic way to rename the columns as described? Thanks in advance.

Yaahtzeck
  • 217
  • 2
  • 13
  • `sub("\\D+(\\d+)\\D+", "\\1", "xxxxxx30xxxx")` is one method. Take a look at `?regex` for a discussion of the regex syntax available in R. – lmo Oct 02 '17 at 14:26
  • Are you just looking for `names(df) <- gsub("\\D", "", names(df))`? – David Arenburg Oct 02 '17 at 14:26

1 Answers1

1
xxxxxx30xxxx <- rep(5,30)
yyyyyyy50yyyyy <- rep(4,30)
zzzzzzz70zzzz <- rep(7,30)
df <- data.frame(zzzzzzz70zzzz,yyyyyyy50yyyyy,xxxxxx30xxxx)

grep(pattern = "[0-100]", x = colnames(df), value= T )

new_colnames <- gsub("\\D", "", colnames(df))
colnames(df) <- new_colnames

I hope i understood you correctly. The gsub command erases everything that is not a digit from the column names, so you're left with the numbers inbetween.

EDIT:

This code matches a two-digit number in your string between 30 and 70, and extracts it.

xxxxxx30xxxx <- rep(5,30)
yyyyyyy50yyyyy <- rep(4,30)
zzzzzzz70zzzz <- rep(7,30)
df <- data.frame(zzzzzzz70zzzz,yyyyyyy50yyyyy,xxxxxx30xxxx)

grep(pattern = "[0-100]", x = colnames(df), value= T )

# new_colnames <- gsub("\\D", "", colnames(df))

new_colnames <- regmatches(colnames(df), regexpr("([3-6][0-9])|([7][0])",colnames(df)))

colnames(df) <- new_colnames

Here's some information on regular expressions and string operations:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html

https://www.regular-expressions.info/rlanguage.html

f.lechleitner
  • 3,554
  • 1
  • 17
  • 35
  • Yes, this works (almost) fine! What if columns name consist of several numbers, for example zz2z3z70zzz5z etc and I want only number that belongs to a certain range, lets say from 50 to 100. In this case it would eliminate 2 3 and 5 also. Thanks! – Yaahtzeck Oct 02 '17 at 14:33
  • Check out my edited answer :) – f.lechleitner Oct 03 '17 at 06:31