0

So, I have a file with various entries contained in { } parenthesis. Example:

{
"classname" "waypoint"
"targetname" "w1"
"target" "w2"
"origin" "672 -1432 32"
"target2" "w9"
}
{
"classname" "light"
"light" "500"
"scale" "4"
"origin" "672 -1440 232"
}
{
"classname" "NPC_Tavion"
"angle" "180"
"origin" "860 -1092 -8"
}
{
"classname" "info_player_start"
"angle" "360"
"origin" "312 -1080 216"
}
{
"classname" "light"
"light" "500"
"scale" "4"
"origin" "320 -1304 232"
}

I want to delete the whole content within { } alongside the { } itself if a word NPC_ is found. So the outcome I want is:

{
"classname" "waypoint"
"targetname" "w1"
"target" "w2"
"origin" "672 -1432 32"
"target2" "w9"
}
{
"classname" "light"
"light" "500"
"scale" "4"
"origin" "672 -1440 232"
}
{
"classname" "info_player_start"
"angle" "360"
"origin" "312 -1080 216"
}
{
"classname" "light"
"light" "500"
"scale" "4"
"origin" "320 -1304 232"
}

I've found a thing that could do it within AWK, but I can only use Windows tools and/or Python. I have come up with a thing like that (well, I just found another codepiece and modified it):

bad_words = ['NPC_']

with open('oldfile.txt') as oldfile, open('newfile.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in bad_words):
            newfile.write(line)

However, I have no idea how to make it include the other content within { }. I have found this question: Remove text between () and [] and I figured out something like that could work:

    ret = ''
    skip1c = 0
    skip2c = 0
    for i in test_str:
        if i == '{':
            skip1c += 1
        elif i == '}' and skip1c > 0:
            skip1c -= 1
        elif skip1c == 0 and skip2c == 0:
            ret += i
    return ret```

But I have no idea how to mix the two :(
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • Oh. As I said, I have access to PowerShell too: `but I can only use Windows tools and/or Python` and I figured it may be an easier way maybe than my terrible Python frankensteins. – kerstoff0mega Jan 24 '22 at 21:37
  • Are you able to change the format of the file? Any particular reason why the file is not saved in any common format where there would be hundreds of existing packages to help? – Iain Shelvington Jan 24 '22 at 21:38
  • It's an entity file used by Quake 3 Arena Engine, Q3A. No way around that. :( @IainShelvington – kerstoff0mega Jan 24 '22 at 21:41
  • Is this a file characteristic question? or a syntax question? If the latter, why haven't they got commas? Id assume they're syntax error... since they're in a dict format as is. Could you show what the original file has before interpretation? – Human006 Jan 24 '22 at 21:44
  • @Human006 this is unfortunately just how entity files work in Q3A Engine. This is the original base format of the file extracted from the gamefiles. – kerstoff0mega Jan 24 '22 at 21:47

2 Answers2

2

You can try with this:

r"\{(.*?)\}"

and use re.DOTALL :

re.DOTALL

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Code to extract content inside curly bracket if contains 'NPC_':

import re

with open('oldfile.txt') as oldfile:
    newfile = oldfile.read()

res = re.findall(r"\{(.*?)\}", newfile, re.DOTALL)

with open('newfile.txt', 'w') as newfile:
    for data in res:
        if 'NPC_' not in data:
            newfile.write('{' + data + '}\n')
ncica
  • 7,015
  • 1
  • 15
  • 37
1

Let s be your string containing the content of the file.

out = ''.join('{' + g for g in s.split('{') if 'NPC_' not in g and len(g) > 1)

A one-liner that doesn't use regex. It divides your string into groups g by splitting the string at {. If a group contains NPC_ or is an empty split, it's ignored for the output. ''.join(..) stitches the string back together.

This Python structure is called generator expression and acts as a for-loop that builds lists or other iterables.

Tobi208
  • 1,306
  • 10
  • 17