1

In my blog app, a user can enter any text as a title for their entry and then I generate a URL based on the text.

I validate their title to make sure it only contains letters and numbers.

If they enter something like

Lorem 3 ipsum dolor sit amet

how could I generate the more SEO friendly version of this text:

Lorem-3-ipsum-dolor-sit-amet
oms
  • 35
  • 4
  • The `[slug]` tag seems to be related to the question. I started to look for a few links in e.g. http://stackoverflow.com/questions/3224419/need-a-simple-regular-expression/3224533#3224533 – polygenelubricants Sep 07 '10 at 22:32

3 Answers3

7

It's in practice really not as simple as replacing spaces by hypens. You would often also like to make it all lowercase and normalize/replace diacritics, like á, ö, è and so on which are invalid URL characters. The only valid characters are listed as "Unreserved characters" in the 2nd table of this Wikipedia page.

Here's how such a function can look like:

public static String prettyURL(String string) {
    return Normalizer.normalize(string.toLowerCase(), Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
        .replaceAll("[^\\p{Alnum}]+", "-");
}

It does basically the following:

  • lowercase the string
  • remove combining diacritical marks (after the Normalizer has "extracted" them from the actual chars)
  • replace non-alphanumeric characters by hyphens

See also:

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
4
String s = "Lorem 3 ipsum dolor sit amet"
s = s.replaceAll(" ","-");
theomega
  • 31,591
  • 21
  • 89
  • 127
0

Since it won't seem to allow me to comment. I would do:

String s = "Lorem 3 ipsum dolor sit amet"
s = s.replaceAll(" ","_");

Using the Underscore character instead because it is a space indicator. Its been a while since I've done java but I know there is a function in .Net that will cleanup a file name so its safe for the file system. I lot of the same general rules applies to a URL so if you can find one in the API it be worth taking a look.

Cericme
  • 146
  • 7