2

Is there a way to check if a character belongs to a CharacterSet?

I wanna know what CharacterSet should I use for character -. Do I use symbols?

I've checked this documentation but still no idea. https://developer.apple.com/documentation/foundation/characterset

When removing extra whitespace at the end of a string, we do it like this:

let someString = " "
print("\(11111) - \(someString)".trimmingCharacters(in: .whitespaces))

But what if I just want to remove the -? Or any special character such as *?

EDIT: I was looking for a complete set of characters per each CharacterSet if it's possible.

rmaddy
  • 314,917
  • 42
  • 532
  • 579
Glenn Posadas
  • 12,555
  • 6
  • 54
  • 95
  • 2
    You mean [this](https://developer.apple.com/documentation/foundation/characterset/2908835-contains)? – user28434'mstep Jan 25 '19 at 16:44
  • That one helped. But I was thinking if there's a complete list for each CharacterSet? – Glenn Posadas Jan 25 '19 at 16:48
  • CharacterSet.symbols.contains("-") returns false. I edited the question. `I was looking for a complete set of characters per each CharacterSet if it's possible` – Glenn Posadas Jan 25 '19 at 16:49
  • 2
    You could create an empty character set and use the `insert(charactersIn:)` function to add the characters you need to trim. – EmilioPelaez Jan 25 '19 at 16:55
  • [List of characters in an NSCharacterSet](https://stackoverflow.com/questions/26610931/list-of-characters-in-an-nscharacterset) [NSArray from NSCharacterSet](https://stackoverflow.com/questions/15741631/nsarray-from-nscharacterset) – jscs Jan 25 '19 at 17:38
  • @JoshCaswell FYI - The first link you provided is nothing more than a copy of Martin's answer from the second link you provided. That first one is now closed as a duplicate of the second. – rmaddy Jan 25 '19 at 18:41
  • Thanks for that @rmaddy! I did not look at them carefully enough. – jscs Jan 25 '19 at 18:41

1 Answers1

4

What you want is defined in the Unicode standard. It is referred to as Unicode General Categories. Each Unicode character is in a category.

The Unicode website provides a complete character list showing the character's code, category, and name. You can also find a complete list of Unicode categories as well.

The - is U+2D (HYPHEN-MINUS). It is listed as being in the "Pd" (punctuation) category.

If you look at the documentation for CharacterSet, you will see punctuationCharacters which is documented as:

Returns a character set containing the characters in Unicode General Category P*.

The "Pd" category is included in "P*" (which means any "P" category).

I also found https://www.compart.com/en/unicode/category which is a third party list of each character by category. A bit more user friendly than the Unicode reference.

To summarize. If you want to know which CharacterSet to use for a given character, lookup the character's category using one of the charts I linked. Once you know its category, look at the documentation for CharacterSet to see which predefined character set applies to that category.

rmaddy
  • 314,917
  • 42
  • 532
  • 579
  • Ah, that last linked helped A LOT. `U+002D` is in Pd. Funny this is a new stuff for me. Meta SO says no need for thanks, but still thanks. – Glenn Posadas Jan 25 '19 at 17:03