-1

I'm trying to write a regular expression that captures the following:

  1. The question which is one line (begins with "Q:")
  2. An indeterminate number of paragraphs following the initial capture, stopping before the next "Q:"

Here's what I've got so far, but I'm striking out:

Not working:

  • (Q:.*?\n){1}(?!Q:)(.+)*
  • (Q:.*?\n){1}(?!Q:)(.+\n+)

What I've got so far works for the top two, but the moment I add in new lines, it doesn't capture subsequent paragraphs.

What am I missing?

Q: What are the service limits associated with Amazon Athena?
Please click here to learn more about service limits.
 
Q: What is the underlying technology behind Amazon Athena?
Amazon Athena uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena can handle complex analysis, including large joins, window functions, and arrays. Because Amazon Athena uses Amazon S3 as the underlying data store, it is highly available and durable with data redundantly stored across multiple facilities and multiple devices in each facility. Learn more about Presto here.
 
Q: How does Amazon Athena store table definitions and schema?
Amazon Athena uses a managed Data Catalog to store information and schemas about the databases and tables that you create for your data stored in Amazon S3. In regions where AWS Glue is available, you can upgrade to using the AWS Glue Data Catalog with Amazon Athena. In regions where AWS Glue is not available, Athena uses an internal Catalog.
You can modify the catalog using DDL statements or via the AWS Management Console. Any schemas you define are automatically saved unless you explicitly delete them. Athena uses schema-on-read technology, which means that your table definitions applied to your data in S3 when queries are being executed. There’s no data loading or transformation required. You can delete table definitions and schema without impacting the underlying data stored on Amazon S3.
Adrian
  • 16,233
  • 18
  • 112
  • 180

1 Answers1

2

You may use the following pattern:

^(Q:.*?\n)(?!Q:)([\s\S]+?(?=^Q:|\Z))

Demo.

Breakdown:

^(Q:.*?\n)     # Matches "Q:" at the beginning of the line, followed by
               # some optional text ending with a line-feed.
(?!Q:)         # Not immediately followed by another "Q:".
(              # Start of the second capturing group.
    [\s\S]+?   # Matches one or more characters (including line breaks) - non-greedy.
    (?=^Q:|\Z) # Stop matching if either followed by "Q:" or is at the end of the string.
)              # End of the second capturing group.