0

Im trying to convert some special characters like ä,ö,ü,α,μ,α,ο,ι, and others from a webpage. When I download the page with the ASIHTTPRequest i get some codes instead of the character itself. Examples:
ä = \u00E4
μ = \u03BC
α = \u03B1

This also happens if I use [NSString stringWithContentsOfURL:aNSURL encoding:NSASCIIStringEncoding error:nil]; I have tried different encodings available but none of them work for the above example. For example: With the NSUnicodeStringEncoding I get some strange like 'chinese' characters and with NSASCIIStringEncoding I get these numbers&letters.

The strange thing is, if I look in the source code, in a web browser like safari, of the webpage, it's all fine, with the normal HTML character entity like: ä = ä

Is there any way to convert these encoded letters back?


Thanks

EDIT
Sorry, that I forgot to mention the source code of a browser above.

I just noticed on this site: link that the hex HTML Entity is very similar to what I have got with tis code. Examples:
ä = ä
μ = μ
α = α

As you can maybe see, they are very similar. Just lowercase and the 0's are replaced with one x, and at the beginning add &#, to the end a ;. I will just have to write some small code to convert the numbers&letters to the hex entities, not going to be a big problem. Then just have to use an HTML entity convertor and done.

Anyway, thanks a lot for helping me out again

Sean

Silicone
  • 643
  • 6
  • 19
  • 2
    Are you saying that the string contains the six characters '\', 'u', '0', '0', 'E', and '4', and you want to to just contain the one character 'ä'? Also, did you try `NSUTF8StringEncoding`? – rob mayoff Nov 13 '11 at 20:08
  • @rob Yes, the string contains six chars and as the result I just won to have that single character. The `NSUTF8StringEncoding` didn't make any difference – Silicone Nov 13 '11 at 20:53
  • 1
    Check out my answer here: http://stackoverflow.com/questions/7860867/converting-escaped-utf8-characters-back-to-their-original-form-in-ios-objective/7861345#7861345 – rob mayoff Nov 13 '11 at 22:40

3 Answers3

1

You can use the found at this link. It uses a built in method from the CFXML parser. It describes the code below

@interface MREntitiesConverter : NSObject {
 NSMutableString* resultString;
}
@property (nonatomic, retain) NSMutableString* resultString;
- (NSString)convertEntiesInString:(NSString)s;
@end

@implementation MREntitiesConverter
@synthesize resultString;
- (id)init
{
 if([super init]) {
 resultString = [[NSMutableString alloc] init];
 }
 return self;
}
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
 [self.resultString appendString:s];
}
- (NSString)convertEntiesInString:(NSString)s {
 if(s == nil) {
 NSLog(@"ERROR : Parameter string is nil");
 }
 NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
 NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
 NSXMLParser* xmlParse = [[NSXMLParser alloc] initWithData:data];
 [xmlParse setDelegate:self];
 [xmlParse parse];
 NSString* returnStr = [[NSString alloc] initWithFormat:@"%@",resultString];
 return returnStr;
}
- (void)dealloc {
 [resultString release];
 [super dealloc];
}
@end

Alternatively you can use NSString* sI = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, (CFStringRef)s, NULL); which is available depending on which OS you are building for.

Alex Zielenski
  • 3,591
  • 1
  • 26
  • 44
1

Also you can check this out and use it: https://github.com/mwaterfall/MWFeedParser/blob/master/Classes/NSString+HTML.m

- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
- (NSString *)stringByLinkifyingURLs;

Check using this method:

- (NSString *)stringByDecodingHTMLEntities;
Lucas Santos
  • 117
  • 2
  • 11
0

After having another try with Rob Mayoffs code it worked! Here is the link to his answer:
Converting escaped UTF8 characters back to their original form

Community
  • 1
  • 1
Silicone
  • 643
  • 6
  • 19