- You want to retrieve the text of
<li ...>....</li>
in Google Document.
- You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
In your case, you want to use the pattern of <li sheet="[a-zA-Z0-9]*">[\s\S]*?<\/li>
, please modify to <li sheet="[a-zA-Z0-9]*">[\\s\\S]*?<\/li>
. In your case, <li ...>....</li>
has several paragraphs. (From your sample value, I thought like this.) By this, when the pattern of const searchPattern = '<li sheet="[a-zA-Z0-9]*">[\\s\\S]*?<\/li>'
is used for body.findText(searchPattern)
, null
is returned. If <li ...>....</li>
is put as one paragraph, body.findText(searchPattern)
returns <li ...>....</li>
.
In order to search <li ...>....</li>
which has several paragraphs, how about the following workaround? The flow of this workaround is as follows.
Flow:
- Use
<li sheet=
and <\/li>
as patterns for searching.
- Using the pattern of
<li sheet=
, retrieve the begin paragraph of <li ...>
.
- Using the pattern of
<\/li>
, retrieve the end paragraph of </li>
.
- Retrieve the texts between the retrieved begin and end paragraph.
- This cycle is continued until all
<li ...>....</li>
values are searched.
Sample script:
function parseLists(body) {
// var doc = DocumentApp.getActiveDocument();
// var body = doc.getBody();
var pattern1 = "<li sheet=";
var pattern2 = "<\/li>";
var range1 = body.findText(pattern1);
var res = [];
while (range1) {
var temp = {};
var p1 = range1.getElement().getParent();
temp.startIndex = body.getChildIndex(p1);
var range2;
if (p1) {
range2 = body.findText(pattern2, range1);
var p2 = range2.getElement().getParent();
temp.endIndex = body.getChildIndex(p2) + 1;
var texts = "";
// for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {
for (var i = temp.startIndex; i < temp.endIndex; i++) {
texts += body.getChild(i).asParagraph().getText();
}
temp.texts = texts;
res.push(temp);
}
range1 = body.findText(pattern1, range2);
}
Logger.log(res)
}
Result:
When your sample values are put to new Google Document and run the script, the following result is retrieved.
[
{
"startIndex": 0,
"endIndex": 5,
"texts": "<li sheet=\"experiences\">{{company_name}}, {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}}</li>"
},
{
"startIndex": 6,
"endIndex": 9,
"texts": "<li sheet=\"other\">{{test}}</li>"
}
]
For above result, if you want to retrieve the values of {{company_name}}, {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}}
and {{test}}
without the tags, please modify above script as follows.
References:
If I misunderstood your question and this was not the direction you want, I apologize.