0

I have abstracts of academic articles. Sometimes, the abstract will contain lines like "PurposeThis article explores...." or "Design/methodology/approachThe design of our study....". I call terms like "Purpose" and "Design/methodology/approach" labels. I want the string to look like this: [label][:][space]. For example: "Purpose: This article explores...."

The code below gets me the result I want when the original string has a space between the label and the text (e.g. "Purpose This article explores....". But I don't understand why it also doesn't work when there is no space. May I ask what I need to do to the code below so that the labels are formatted the way I want, even when the original text has no space between the label and the text? Note that I imported re.sub.

def clean_abstract(my_abstract):
    labels = ['Purpose', 'Design/methodology/approach', 'Methodology/Approach', 'Methodology/approach' 'Findings', 'Research limitations/implications', 'Research limitations/Implications' 'Practical implications', 'Social implications', 'Originality/value']
    for i in labels:
        cleaned_abstract = sub(i, i + ': ', cleaned_abstract)
    return cleaned_abstract
henrich
  • 597
  • 1
  • 8
  • 22

1 Answers1

0

Code

See code in use here

labels = ['Purpose', 'Design/methodology/approach', 'Methodology/Approach', 'Methodology/approach' 'Findings', 'Research limitations/implications', 'Research limitations/Implications' 'Practical implications', 'Social implications', 'Originality/value']
strings = ['PurposeThis article explores....', 'Design/methodology/approachThe design of our study....']
print [l + ": " + s.split(l)[1].lstrip() for l in labels for s in strings if l in s]

Results

[
    'Purpose: This article explores....',
    'Design/methodology/approach: The design of our study....'
]

Explanation

Using the logic from this post.

  • print [] returns a list of results
  • l + ": " + s.split(l)[1].lstrip() creates our strings
    • l is explained below
    • : literally
    • s.split(l).lstrip() Split s on l and remove any whitespace from the left side of the string
  • for l in labels Loops over labels setting l to the value upon each iteration
  • for s in strings Loops over strings setting s to the value upon each iteration
  • if l in s If l is found in s
ctwheels
  • 21,901
  • 9
  • 42
  • 77