I have huge NSString
with HTML text inside. The length of this string is more then 3.500.000 characters. How can i convert this HTML text to NSString
with plain text inside. I was using scanner , but it works too slowly. Any idea ?

- 69,473
- 35
- 181
- 253

- 133
- 2
- 14
- 29
-
1possible duplicate of [Remove HTML Tags from an NSString on the iPhone](http://stackoverflow.com/questions/277055/remove-html-tags-from-an-nsstring-on-the-iphone) – hpique Mar 13 '14 at 14:53
7 Answers
It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:
Xcode 9/Swift 4
if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {
print(attributedString)
}
You can even create an extension like this:
extension String {
var htmlToAttributedString: NSAttributedString? {
guard let data = self.data(using: .utf8) else {
return nil
}
do {
return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("Cannot convert html string to attributed string: \(error)")
return nil
}
}
}
Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.
Swift 3
let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)
Objective-C
[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
If you just need to remove everything between <
and >
(dirty way!!!), which might be problematic if you have these characters in the string, use this:
- (NSString *)stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}

- 3,969
- 3
- 29
- 40
-
-
How do I replace HTML entities like & with their plain text equivalent. i.e. & – ThE uSeFuL Feb 02 '15 at 04:01
-
1@ThEuSeFuL check this answer: http://stackoverflow.com/questions/1105169/html-character-decoding-in-objective-c-cocoa-touch – o15a3d4l11s2 Feb 02 '15 at 11:16
-
1
-
Keep in mind, that using NSHTMLTextDocumentType requires to run synchronously on the main thread which is getting locked. – vahotm Mar 06 '18 at 13:28
I resolve my question with scanner, but i use it not for all the text. I use it for every 10.000 text part, before i concatenate all parts together. My code below
-(NSString *)convertHTML:(NSString *)html {
NSScanner *myScanner;
NSString *text = nil;
myScanner = [NSScanner scannerWithString:html];
while ([myScanner isAtEnd] == NO) {
[myScanner scanUpToString:@"<" intoString:NULL] ;
[myScanner scanUpToString:@">" intoString:&text] ;
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@""];
}
//
html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
return html;
}
Swift 4:
var htmlToString(html:String) -> String {
var htmlStr =html;
let scanner:Scanner = Scanner(string: htmlStr);
var text:NSString? = nil;
while scanner.isAtEnd == false {
scanner.scanUpTo("<", into: nil);
scanner.scanUpTo(">", into: &text);
htmlStr = htmlStr.replacingOccurrences(of: "\(text ?? "")>", with: "");
}
htmlStr = htmlStr.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines);
return htmlStr;
}

- 12,440
- 10
- 52
- 81

- 133
- 2
- 14
- 29
-
add a @autoreleasepool into the while loop for preserving memory – Rafael Gonçalves Jun 14 '15 at 20:43
-
Note: this will also replace anything between tags, so if you have an email address like "Some Name
" it'll strip out – strangetimes May 10 '18 at 18:00. That's probably not what you want. It needs to possibly look up against a map of known HTML tags.
Objective C
+ (NSString*)textToHtml:(NSString*)htmlString
{
htmlString = [htmlString stringByReplacingOccurrencesOfString:@""" withString:@"\""];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@"'" withString:@"'"];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@"&" withString:@"&"];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@"<" withString:@"<"];
htmlString = [htmlString stringByReplacingOccurrencesOfString:@">" withString:@">"];
return htmlString;
}
Hope this helps!

- 4,422
- 1
- 27
- 33
For Swift Language ,
NSAttributedString(data:(htmlString as! String).dataUsingEncoding(NSUTF8StringEncoding, allowLossyConversion: true
)!, options:[NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: NSNumber(unsignedLong: NSUTF8StringEncoding)], documentAttributes: nil, error: nil)!

- 1,433
- 1
- 15
- 28
- (NSString *)stringByStrippingHTML:(NSString *)inputString
{
NSMutableString *outString;
if (inputString)
{
outString = [[NSMutableString alloc] initWithString:inputString];
if ([inputString length] > 0)
{
NSRange r;
while ((r = [outString rangeOfString:@"<[^>]+>| " options:NSRegularExpressionSearch]).location != NSNotFound)
{
[outString deleteCharactersInRange:r];
}
}
}
return outString;
}

- 2,338
- 1
- 19
- 30
Swift 4:
do {
let cleanString = try NSAttributedString(data: htmlContent.data(using: String.Encoding.utf8)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)
} catch {
print("Something went wrong")
}

- 4,694
- 7
- 54
- 98
It can be more generic by passing encoding type as parameter, but as example as this category:
@implementation NSString (CSExtension)
- (NSString *)htmlToText {
return [NSAttributedString.alloc
initWithData:[self dataUsingEncoding:NSUnicodeStringEncoding]
options:@{NSDocumentTypeDocumentOption: NSHTMLTextDocumentType}
documentAttributes:nil error:nil].string;
}
@end

- 5,887
- 1
- 47
- 66
-
in this method where you are passing string may be on self...? – Raviteja Mathangi Apr 30 '19 at 11:50
-
@Raviteja_DevObal Ah sorry this was category, i could be more clear , will edit ... – Renetik May 01 '19 at 00:15
-
But I don't believe this answer is correct anymore as there ir requirement of large html and this is terribly slow. I ended up using DTCoreText with some additional modifications for showing images correctly my solution is public on github though. – Renetik May 01 '19 at 00:21
-
This method is not converting dynamic HTML text from service.Means i don't know which HTML content is coming from service.But replacing with custom method's – Raviteja Mathangi May 01 '19 at 16:15
-
Sorry that was typo: But I don't believe this answer is NOT correct anymore as there is requirement of large html and this is terribly slow. I ended up using DTCoreText with some additional modifications for showing images correctly my solution is public on github though. – Renetik May 02 '19 at 00:53
-
I don't know what you are talking about... If you want to convert any html to text this works, the downside is that it's slow so I don't thing it will work for large html but maybe yes, depends on where you gona use it. – Renetik May 02 '19 at 00:56