0

I'm using Foursquare's API to retrieve some attraction names. The problem is, for certain cities (like Cairo, Moscow, Beijing) the English name of the attraction is appended to the name in the language of the country, so for example an attraction in Cairo will look like this:

Wekalet Al-Ghouri Arts Center | وكالة السلطان الغوري

For each attraction I use Flickr's API to find a photo where the name is used in the query. However, there are almost no results for the string above while just querying 'Wekalet Al-Ghouri Arts Centre' gives a lot of results. So my question is, is there a way of identifying and removing non-english characters from a string? Thanks for any help in advance :)

RodMatveev
  • 97
  • 8
  • 4
    What is an "english character"??? Languages and writing systems (alphabets etc.) are two different things. – matt Jun 03 '15 at 19:07
  • 1
    Do you mean just A-Z without any accents or other diacritics? – rmaddy Jun 03 '15 at 19:12
  • To clarify, I want to leave any letters that are part of the english alphabet (a-z) while removing everything else – RodMatveev Jun 03 '15 at 19:23
  • 1
    In this case, there's a pipe character `|` separating the two translations, so, assuming that's true for other entries, why not just split the string on that? You don't need to identify the alphabet at all. – jscs Jun 03 '15 at 19:23
  • @JoshCaswell the problem is the pipe character is only present in the names for Cairo. For something like Beijing it's the Chinese name directly followed by the english name without any special characters in between... – RodMatveev Jun 03 '15 at 19:29
  • 3
    See my answer to this other question: https://stackoverflow.com/questions/27697591/remove-apostrophe-in-cfstringtransform-results/27698313#27698313 – Ken Thomases Jun 03 '15 at 19:31
  • 1
    Well, that's a bummer then. – jscs Jun 03 '15 at 19:33
  • Related, maybe duplicate depending on your definition of "English character": [Remove non-ASCII characters from NSString in ObjC](http://stackoverflow.com/q/6361586) – jscs Jun 03 '15 at 19:35

2 Answers2

2

My hacky solution:

NSString *stringWithForeignCharacters = @"Wekalet Al-Ghouri Arts Center | وكالة السلطان الغوري";
NSMutableCharacterSet *englishCharacterSet = [NSMutableCharacterSet characterSetWithCharactersInString:@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+ "];
// Add other such character sets as needed
[englishCharacterSet formUnionWithCharacterSet:[NSCharacterSet symbolCharacterSet]];
NSCharacterSet *foreignCharacters= [englishCharacterSet invertedSet];
NSString *filteredString= [[stringWithForeignCharacters componentsSeparatedByCharactersInSet:foreignCharacters] componentsJoinedByString:@""];

Warning: This might be slow for complex strings.

lead_the_zeppelin
  • 2,017
  • 13
  • 23
1

Assuming that you want to have only the ASCII character set (changing this is very easy in below code) you can do this

NSString *source = …;
NSMutableString *dest = [source mutableCopy];

NSCharacterSet *validCharacters = [NSCharacterSet characterSetWithCharactersInString:@" -+abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"];
NSCharacterSet *invalidCharacters = [validCharacters invertedSet];

NSRange invalidRange;
while ( (invalidRange = [dest rangeOfCharactersFromSet:invalidCharacters]).length != 0)
{
   [dest replaceCharactersInRange:invalidRange withString:@""];
}

Typed n Safari. }

Amin Negm-Awad
  • 16,582
  • 3
  • 35
  • 50
  • I tried your solution, works really well with all foreign languages and none of the strings seem to impact the speed at all :) – RodMatveev Jun 03 '15 at 19:48
  • 2
    1) It's `rangeOfCharacterFromSet:`, not `rangeOfCharactersFromSet:`. 2) This may be a bit pedantic but instead of checking if the range length is not zero, you should check of the range location is not `NSNotFound`. – rmaddy Jun 03 '15 at 19:51
  • 1
    String length may be short enough here that it doesn't make a practical difference, but scanning once through with `NSScanner` would be more efficient than constantly re-starting from the beginning. Or you might at least use `rangeOfCharacterFromSet:options:range:`, passing the last found range. – jscs Jun 03 '15 at 20:02
  • @rmaddy 1) Sorry for the typo. 2) No, it is not pedantic, it is wrong. From the docs: "Returns a range of {NSNotFound, **0**} if none of the characters in aSet are found." Checking for zero length is pretty documented. – Amin Negm-Awad Jun 03 '15 at 20:02
  • I agree with @rmaddy that it's better to check `.location` for `NSNotFound` as it's the standard Cocoa pattern to signal no match. – Nikolai Ruhe Jun 04 '15 at 09:10
  • Also, the performance of this solution is terrible (compared with for example @lead_the_zeppelin's). – Nikolai Ruhe Jun 04 '15 at 09:11
  • @NikolaiRuhe Oh, it is a standard Cocoa pattern? For methods returning a range? Really? AppKit's return values are in standard `{ NSNotFound, 0 }`. Foundation's `NSRangeFromString()` returns `{ 0, 0 }`. Foundation's `NSIntersectionRange()` returns `{ $undefined, 0 }`. The standard doesn't seem to be '{ NSNotFound, $somethingElse }`, but `{ $SomethingElse, 0 }'. – Amin Negm-Awad Jun 04 '15 at 17:53
  • @NikolaiRuhe I know zepellin's solution. As you can see in my comment and the link I placed there, I was the first one posting this trick on the dev list 6.5 years ago. 6.5 years ago and today I say that it is a bit tricky. The semantics of the methods are different, what makes the code less readable. So none should use that, if it is not necessary. Ups, it hasn't been necessary for the OP. Donald Knuth would be proud … – Amin Negm-Awad Jun 04 '15 at 17:58
  • It's worked for me, not sure who and why someone downvoted to -1, i tried to make balance, Thnx man :) – Chander Shakher Ghorela - Guru Dec 10 '17 at 10:55