2

For a NLP-task I have several markdowns which store my training data.

##intent:greet
- Hi
- Hello
- Good Day
- Good Morning
- Good Evening

##intent:say_thank_you
- Thank you
- Thx
- awesome
- great
- thank you very much

I generate new training data while communicating with the bot. After loading, cleaning, etc. I will get a dict.

{0 : {
    'intent':'greet',
    'data':'good day sir'
    },
1 : {
    'intent':'greet',
    'data':'good afternoon'
    },
2 : {
    'intent':'say_thank_you',
    'data':'good job
    }
}

Now I want to append the sentences to my md file. I think the easiest way is directly after the ##intent:<intentname>

My first static approach was following:

intent = 'greet'
identifier = "##intent:"+intent
with open('<myPath.md>') as myfile:
    if identifier in myfile.read():
        print("found intent")
    else:
        print("no intent with name greet")

Although I have a valid md-File with an intent greet I can´t find the line in the code. I assume I can´t search for markdown syntax in a file this way.

Is there a way to search for markdown in a md-File without changing the file? I noticed some suggestions to transform the file into HTML, but is there an easier way to do this?

adama
  • 537
  • 2
  • 10
  • 29

1 Answers1

1

My strategy would be to create a new file, copy lines over to it. When you find the section you're looking for add new lines. When you're finished, remove the source and use the new file.

Something like this should work:

from pathlib import Path

def add_identifier(filename: str, key: str, item: str):
    source_file = Path(filename) # intent.txt
    dest_file = source_file.with_suffix('.replaced' + source_file.suffix) # intent.replaced.txt
    
    found = False
    with open(source_file) as f, open(dest_file, 'w') as rf:
        for line in f:
            # copy line 
            rf.write(line)
            if f'##intent:{key}' in line:
                found = True
                # insert new identifier
                rf.write(f'- {item}\n')
        if not found:
            # create a new section and add the identifier
            rf.write(f'\n\n##intent:{key}\n')
            rf.write(f'- {item}\n')
        
    # remove original and rename new file
    # source_file.unlink()
    # dest_file.rename(source_file)
    
# usage
add_identifier('intent.txt', key='greet', item="hello m'lady")

I've added a check to add a new section if it doesn't exist on the original file.

abdusco
  • 9,700
  • 2
  • 27
  • 44
  • Thank you for your suggestion, but this is not as automatized as I wish, because I have to replace the file. Do you think there is an option to directly append to a md-file rather than having a temporary txt file? – adama Aug 25 '20 at 11:51
  • Unfortunately, you cannot insert content in the middle of a file (well you can but you have to overwrite the rest of the file). Unless you're appending new lines to the end of a file, OS creates a new file and writes modified content inside in the background anyway. There's a python module in stdlib https://docs.python.org/3.8/library/fileinput.html#fileinput.FileInput but it also creates a new file and replaces the original. – abdusco Aug 25 '20 at 12:05
  • See: https://stackoverflow.com/questions/125703/how-to-modify-a-text-file – abdusco Aug 25 '20 at 12:06
  • If the is very large, I'd use sqlite to create a mini database which will give you much better performance than any file related approach, plus it's structured data so you don't have to parse it. – abdusco Aug 25 '20 at 12:09
  • I understand. It looks that I have to make my script semiautomatic. Therefore your answer is perfect for. – adama Aug 25 '20 at 12:39