Below is the HTML code that I want to parse through in Swift:
<td class="pinyin">
<a href="rsc/audio/voice_pinyin_pz/yi1.mp3">
<span class="mpt1">yī</span></a>
<a href="rsc/audio/voice_pinyin_pz/yan3.mp3">
<span class="mpt3">yǎn</span>
</a>
</td>
I have read that Regex is not a good way to parse through HTML but nevertheless I have written an expression that capture what I want (which are the letters between the span): yī
and yǎn
Regex expression:
/pinyin.+<span.+>(.+)<\/.+<span.+>(.+)<\//Us
I was wondering how to implement it in so that I can capture both yī
and yǎn
at the same time and save it into an array. Also, I was wondering if there is another way that I would be able to do this without Regex.
EDIT:
I ended up using TFHpple as suggested by Rob. Although I did take a long time to figure out how to import it into Swift so I thought it would be helpful to post it here for convenience:
1. Open your project and drag the TFHpple files into it
2. At this point XCode will probably prompt you to create a bridging-header class file if you haven't included any Obj-C code in your current project. In this bridging-header file you should add:
#import <Foundation/Foundation.h>
#import "TFHpple.h"
#import "TFHppleElement.h"
3. Select the target, under General, in Linked Frameworks and Libraries (just scroll down when you are in the General tab and you will see it, add libxml2.2.dylib and libxml2.dylib
4. Under Build Settings, in Header Search Paths, add $(SDKROOT)/usr/include/libxml2 WARNING: be sure that it isn't User Header Search Paths as this is not the same
5. Under Build Settings, in Other Linker Flags, add -lxml2
Enjoy!