Trying to extract data from between two sets of characters

Question

I'm trying to extract some data from a text file in a usable way, however I can't quite work out the correct way to do it. The raw text file looks like this:

<!-- @[Hero(super)] -->

# Creating new contexts

<!-- @[UsageExample] -->

## Usage example

```javascript
  Import { ICON_NAME } from 'Icons'
```

<!-- @[/Hero] -->

<!-- @[ArticleSection] -->

I need it to give me some JSON which looks like this:

[
  {
    "name": "Hero",
    "type": "super",
    "h1" "Creating new contexts"
  },
  {
    "name": "UsageExample",
    "h2" "Usage example",
    "codeType": "JavaScript",
    "code": "Import { ICON_NAME } from 'Icons'",
    "parent": "Hero"
  }
]

I am not expecting help with all of it, the finer details are ok. The part I'm struggling with is working out how to determine the content between  and 

tl;dr: I'm looking for a way to extract text between  and 

What have you already tried? Can you show us an example of that? — doom87er, Nov 02 '18 at 19:12

Pushpesh Kumar Rajwanshi · Answer 1 · 2018-11-02T19:52:22.240

2

You can use this regex to capture each and every data you have mentioned in your post, using which you can create your own JSON as you mentioned in your post.

(?s)<!-- @\[(\w+)\((\w+)\)\] -->\s+# ([\w ]+?)\s+<!-- @\[(\w+)\] -->\s+## ([\w ]+?)\s+```(\w+)\s+(.*?)```\s+<!-- @\[\/(\w+)\] -->

Named group version of above regex,

(?s)<!-- @\[(?<name>\w+)\((?<type>\w+)\)\] -->\s+# (?<h1>[\w ]+?)\s+<!-- @\[(?<name2>\w+)\] -->\s+## (?<h2>[\w ]+?)\s+```(?<codeType>\w+)\s+(?<code>.*?)```\s+<!-- @\[\/(?<parent>\w+)\] -->

Here there are two names, and you can't have duplicate group name, hence second one is named as name2.

(?s) This enables a dot to match a new line which will help you capture data in multiple lines
Rest of the regex basically captures the data you want into various groups that you can see in the regex101 demo.

Demo,

https://regex101.com/r/VUkRiJ/2

https://regex101.com/r/VUkRiJ/3 (named group version)

edited Nov 02 '18 at 19:52

answered Nov 02 '18 at 19:11

Pushpesh Kumar Rajwanshi

18,127
2
19
36

I don't think OP is looking for the raw text between the two tags, I think he's looking for a way to separate out what he wants to be in the JSON from the text file. I'm not really sure, isn't super clear – doom87er Nov 02 '18 at 19:19
Well, you could be right so I am updating my answer using which OP can find every piece of information from the given text in post. – Pushpesh Kumar Rajwanshi Nov 02 '18 at 19:32
1

Thanks, @PushpeshKumarRajwanshi I think that the `(?s)` part is probably what I was missing. – Alex Foxleigh Nov 02 '18 at 19:46
1

@AlexFoxleigh: Yes, could be. I know the regex has gotten little complex and may not be easily maintainable hence I am updating my answer to include the same regex again but with named groups with keys you have mentioned in your json. Using which you will be able to capture data easily by referencing names instead of group 1, group 2 etc. – Pushpesh Kumar Rajwanshi Nov 02 '18 at 19:48
1

That is amazing. Thank you so much! – Alex Foxleigh Nov 02 '18 at 23:54
@AlexFoxleigh: Are you still facing any issue? Do let me know if you need help – Pushpesh Kumar Rajwanshi Nov 05 '18 at 12:28
I am, I'm afraid. You pointed me towards a path I think is correct, however, I'm having trouble implementing it in JavaScript. I've made a new question here as it's not exactly the same as the one above: https://stackoverflow.com/questions/53171288/xregexp-unmatched-yet-everything-appears-to-be-balanced – Alex Foxleigh Nov 06 '18 at 11:48

Trying to extract data from between two sets of characters

1 Answers1