5

How do i write a RegExp to validate SMS Text is only keyboard character (abc, ABC, 123, ~!@#$%^&*()`[]{}|;':',./<>?)

Thanks...

Ironman
  • 449
  • 1
  • 5
  • 13

4 Answers4

10

The default GSM character set is defined in GSM 03.38. Assuming you're looking at decoded text, not the 7bit packed format that is really used, a regex like the following should limit you to the allowable characters

"@£$¥èéùìòÇ\fØø\nÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./[0-9]:;<=>\?¡[A-Z]ÄÖÑܧ¿[a-z]äöñüà\^\{\}\[~\]\|€"

Note though that it is possible to sent unicode UCS-2 messages, at which point the handset receiving the message has to have suitable glyphs for presentation to the user, the unicode itself is not a limiting factor.

ptomli
  • 11,730
  • 4
  • 40
  • 68
4

I propose to do it manually.

You just have to take care of some exceptions like the [ ] (need escaping) the backquote and the quote depending on the language you are writing with (since it coud end the string of the pattern)

^[a-zA-Z0-9~!@#$%^&*()`\[\]{};':,./<>?| ]*$

Maybe it would require a little tuning. I'm pretty sure that - and _ are accepted in SMS texts.

M'vy
  • 5,696
  • 2
  • 30
  • 43
1

I searched a lot but, I think best one is.

function CharecterControl(input) {
    var str = /[^A-Za-z0-9 \\r\\n@£$¥èéùìòÇØøÅå\u0394_\u03A6\u0393\u0027\u0022\u039B\u03A9\u03A0\u03A8\u03A3\u0398\u039EÆæßÉ!\#$%&amp;()*+,\\./\-:;&lt;=&gt;?¡ÄÖÑܧ¿äöñüà^{}\\\\\\[~\\]|\u20AC]*/; 
    return !new RegExp(str).test(input);       
}
zVictor
  • 3,610
  • 3
  • 41
  • 56
KnowGe
  • 305
  • 3
  • 10
  • it seems that backslashes have been escaped too many times in this regex. Wouldn't the right one be `[^A-Za-z0-9 \r\n@£$¥èéùìòÇØøÅå\u0394_\u03A6\u0393\u0027\u0022\u039B\u03A9\u03A0\u03A8\u03A3\u0398\u039EÆæßÉ!\#$%&()*+,\./\-:;<=>?¡ÄÖÑܧ¿äöñüà^{}\\\[~\]|\u20AC]`? – zVictor Sep 24 '15 at 09:39
  • @zVictor is correct - that is the proper character string. Also, this sample threw me off - a better function name is `isGsmEncoded` - or for the reverse (if you want to test for non-GSM characters instead, just remove the `!`), it could then be called `hasUcs2Characters`. – qJake Mar 18 '20 at 20:35
1

I know that I'm a little late to the party, but I've been fighting with this. I recently ran across Twitter's Open Source Project:

https://github.com/twitter/cloudhopper-commons-charset

It provides a great way of cleaning Strings before sending them based on charsets. It also supports encoding a string as bytes based on a SMS friendly charset. Here is my example cleaning an existing string before sending through SMS using their libraries:

public static String cleanSMS(String msg) {
    Charset charset = CharsetUtil.map(CharsetUtil.NAME_GSM7);
    StringBuilder cleaned  = new StringBuilder(msg);
    log.info("Accent chars replaced: " + MobileTextUtil.replaceAccentedChars(cleaned));
    log.info("Safe chars replaced: " + MobileTextUtil.replaceSafeUnicodeChars(cleaned));
    return CharsetUtil.normalize(cleaned.toString(), charset);
}
bconneen
  • 146
  • 2
  • 12