I want to get an array of all the lines which start by text:
(till the first asset_performance_label
)
I saw this post, but wasn't sure how to apply it.
Should I convert the proto to string, as I have tried?
text = extract_text_from_proto(r"(\w+)text:(\w+)asset_performance_label:", '''[pinned_field: HEADLINE_1
text: "5 Best Products"
asset_performance_label: PENDING
policy_summary_info
{
review_status: REVIEWED
approval_status: APPROVED
}
, pinned_field: HEADLINE_1
text: "10 Best Products 2021"
asset_performance_label: PENDING
policy_summary_info
{
review_status: REVIEWED
approval_status: APPROVED
}''')
def extract_text_from_proto(regex, proto_string):
regex = re.escape(regex)
result_array = [m.group() for m in re.finditer(regex, proto_string)]
return result_array
# return [extract_text(each_item, regex) for each_item in proto],
def extract_text(regex, item):
m = re.match(regex, str(item))
if m is None:
# text = "MISSING TEXT"
raise Exception("Ad is missing text")
else:
text = m.group(2)
return text
Expected result: ["5 Best Products","10 Best Products 2021"]
What if I want to match (optional) pinned_field: (word)
? so the result could be: [
HEADLINE_1: 5 Best Products', 'HEADLINE_1:10 Best Products 2021', 'some_text_without_pinned_field']` ?