I have many text files. In each text file, there is a section of interest (below):
<tr>
<td ><b>发起时间</b></td>
<td colspan="2" style="text-align: left">2015-04-08</td>
<td style="width: 25%;"><b>回报机制</b></td>
<td colspan="2" style="text-align: left">使用者付费</td>
</tr>
The information that varies across files is the date only. In this case, the date is 2015-04-08
.
I want to extract the date. I am an R user, and I normally would use str_match
from the stringr
package. I would indicate the following as the start of the string:
<td ><b>发起时间</b></td>
<td colspan="2" style="text-align: left">
However, I am not sure what to do given that this string is spread over two lines. What can I do? (It also contains Chinese characters, but that's a separate issue)
But I'm not sure how to do so, given that