1

I work with a large df with 'sloppy' strings with characters, numbers and punctuation characters like this:

cnames <- c("X1_1", "X1_12", "X1_9", X11_9, "X4_112", "X4_2")

These strings can't be ordered properly by R because of the missing of the required 'preceeding zeros'.

I worked with the regular expressions to convert it to:

"X01_01", "X01_12", "X01_09", X11_09, "X04_12", "X04_02"

and this requires quite a bit of programming (was a bit rusty on RegEx)!

I think I am not the only one that faces this problem so I wondered:

Is there a package that:

  • automatically detects 'patterns' which parts of the code consists of numbers
  • detects the maximum length of each part
  • fills in the right number of zero's that has to be placed before each number
  • returns the string in the format that can be ordered logically

If it does not exist, maybe I found a nice case to write a package.

Harsh Patel
  • 6,334
  • 10
  • 40
  • 73
Marlein
  • 115
  • 2
  • 8
  • 2
    What's your question: 1. Do you want to find a package that modifies a string; 2. Order string `cnames`? – pogibas Mar 16 '18 at 08:53
  • Looks like the search term you were looking for is Natural Sort. https://en.wikipedia.org/wiki/Natural_sort_order – Dylan Brams Mar 16 '18 at 09:10
  • Thank you both. My question was a bit sloppy, sorry for that! Thank you for the reply. This and the answer below has helped me greatly. – Marlein Mar 22 '18 at 13:49

1 Answers1

1

To convert your characters you can do:

cnames <- c("X1_1", "X1_12", "X1_9", "X11_9", "X4_112", "X4_2")
d <- read.table(text=sub("^X", "", cnames), sep="_")
sprintf("X%02d_%03d", d$V1, d$V2)
# > sprintf("X%02d_%03d", d$V1, d$V2)
# [1] "X01_001" "X01_012" "X01_009" "X11_009" "X04_112" "X04_002"
jogo
  • 12,469
  • 11
  • 37
  • 42
  • THNX!! I delved a bit deeper in stringr and also found a solution. May be I was too quick postig my first question. But thank you all! – Marlein Mar 22 '18 at 13:50