0

I have a JSON file where the structure looks like the following:

{
    "events": [
        {
            "id": 1,
            "name": "EV001",
            "note": "",
            "pages": [
                {
                    "list": [
                        {
                            "code": 231,
                            "indent": 0,
                            "parameters": [
                                0
                            ]
                        },
                        {
                            "code": 401,
                            "indent": 0,
                            "parameters": [
                                "ひな"
                            ]
                        },
                        {
                            "code": 401,
                            "indent": 0,
                            "parameters": [
                                "ひらがな"
                            ]
                        },
                        {
                            "code": 131,
                            "indent": 0,
                            "parameters": [
                                0
                            ]
                        },...
                    ]
                }
            ]
        }
    ]
}

My goal is to grab any text inside "parameters" where "code" = 401. After I grab this text I translate it, then I want to put it back in the same spot.

Currently I use the following function to extract the text:

# Extract 401 Text
    untranslatedTextList = []

    events = data['events']
    for event in events:
        if event is not None:
            for page in event['pages']:
                for command in page['list']:
                    if command['code'] == 401:
                        untranslatedTextList.append(command['parameters'][0])

This gives me untranslatedTextList which is a list of all the strings I need to translate. I can translate this list using whatever method I like.

My problem starts here. Normally I would translate line by line so that I could easily retain the position of where I grabbed the raw text from and then write back into the same command. However this has too many drawbacks.

  1. (Main Issue) The translation quality suffers greatly because the machine doesn't have the context. Much of the text is dialogue and requires knowledge of what was just said or what the context is.
  2. The cost is much higher line by line vs one giant batch.
  3. The time taken for translation is much greater due to the larger number of requests.

Therefore my only choice is to translate all of that text in the list in a single request to avoid the above pitfalls. However, afterwards I'm left with a translation blob of differing length where it's nearly impossible to know which sentences go to which 401 codes. I have tried using delimiters to mark where each group of 401's end, however GPT3.5 likes to randomly add/remove these delimiters throwing everything off.

Frankly after thinking about it for a long time it seems like an impossible task, but maybe someone in the community has a good idea.

I have tried groupings, delimiters, and forcefully matching the two lists. All result in a small mismatch in one of the positions of the 401 which throws off the order of everything in the file and causes bugs.

DazedFury
  • 59
  • 9
  • When you translate an entire sentence, parts of the sentences are necessarily moved around because the natural sentence structure is different for each language. Because a good translation will reorder a sentence, it naturally becomes impossible, and undesirable I might add, to return the sentence to the original order. If you want a good translation, it will be out of order. If you want the translation to be in the original order, you'll get a bad translation. – SimonUnderwood Apr 04 '23 at 00:25
  • Yup, thats what I figured. Really I wish there was a way to rewrite the entire file to fit the new translation but I have no good ideas on how to do that properly. Interestingly ChatGPT lets you input the context of the work with every request using 'System' but I have had mixed results getting it to work consistently and the cost is huge although I did notice improvements in the translation. – DazedFury Apr 04 '23 at 00:31

0 Answers0