1

I have a text file in this format:

000000.png 712,143,810,307,0
000001.png 599,156,629,189,3 387,181,423,203,1 676,163,688,193,5
000002.png 657,190,700,223,1
000003.png 614,181,727,284,1
000004.png 280,185,344,215,1 365,184,406,205,1

I want to remove the lines that don't have a [number1,number2,number3,number4,1] or [number1,number2,number3,number4,5] ending and also strip the text line and remove the [blocks] -> [number1,number2,number3,number4,number5] that don't fulfill this condition.

The above text file should look like this in the end:

000001.png 387,181,423,203,1 676,163,688,193,5
000002.png 657,190,700,223,1
000003.png 614,181,727,284,1
000004.png 280,185,344,215,1 365,184,406,205,1

My code:

import os

with open("data.txt", "r") as input:
    with open("newdata.txt", "w") as output:
        # iterate all lines from file
        for line in input:
            # if substring contain in a line then don't write it
            if ",0" or ",2" or ",3" or ",4" or ",6" not in line.strip("\n"):
                output.write(line)

I have tried something like this and it didn't work obviously.

CreepyRaccoon
  • 826
  • 1
  • 9
  • 19
Vanq
  • 35
  • 8

1 Answers1

1

No need for Regex, this might help you:

with open("data.txt", "r") as input:        # Read all data lines.
    data = input.readlines()
with open("newdata.txt", "w") as output:    # Create output file.
    for line in data:                       # Iterate over data lines.
        line_elements = line.split()        # Split line by spaces.
        line_updated = [line_elements[0]]   # Initialize fixed line (without undesired patterns) with image's name.
        for i in line_elements[1:]:         # Iterate over groups of numbers in current line.
            tmp = i.split(',')              # Split current group by commas.
            if len(tmp) == 5 and (tmp[-1] == '1' or tmp[-1] == '5'):
                line_updated.append(i)      # If the pattern is Ok, append group to fixed line.
        if len(line_updated) > 1:           # If the fixed line is valid, write it to output file.
            output.write(f"{' '.join(line_updated)}\n")
CreepyRaccoon
  • 826
  • 1
  • 9
  • 19
  • Btw, be careful because `input` is a reserved word... – CreepyRaccoon Nov 20 '22 at 18:06
  • CreepyRaccoon, it works like a charm, thanks. I actually found out that I want to change that '1' or '5' to '0' and '1'. I have tried to do: if len(tmp) == 5 and tmp[-1] == '1': tmp[-1] = '0' and another if loop for the 5, but the logic seems wrong cause with the if loop we are just checking if those conditions are met and can't change the values ig. – Vanq Nov 22 '22 at 14:48
  • What do you mean @Vanq? the logic works fine. You can change the values of the current group `tmp` (as it is a list) before appending it to `line_updated`. The same can be done in another(s) `if` statement(s) following the one you just changed. E.g., `if len(tmp) == 5 and (tmp[-1] == '1' or tmp[-1] == '5'):` then `tmp[-1] = '0'` and finally `line_updated.append(','.join(tmp))`. This example would transform "387,181,423,203,1" to "387,181,423,203,0" in line 2 (000001.png). – CreepyRaccoon Nov 22 '22 at 17:38
  • @CrazyRaccoon, I still had line_updated.append(i) and not line_updated.append(','.join(tmp)). For me the logic was not really clear, I thought changing tmp[-1], would update the i. I was trying to put another if loop just for the line_updated.append(i) cause I thought the values have been changed but in the next "iteration" or something. – Vanq Nov 22 '22 at 17:53
  • @Vanq, `tmp[-1]` **cannot** update `i` because as you can see `tmp = i.split(',')`, so you have 2 options here; modify `tmp` followed by `line_updated.append(','.join(tmp))` which is IMHO the best and easiest option, or modify `i` followed by `line_updated.append(i)`, an example for this latter is `i = i.replace('5', '0')` but this is less robust. – CreepyRaccoon Nov 22 '22 at 20:57
  • I really encourage you to debug the code because the answers to all of your questions are already in this snippet and it will be easier for you to understand it this way :) – CreepyRaccoon Nov 22 '22 at 21:02
  • yes thanks. I think now I understood everything about it. I first thought `line_updated.append(','.join(tmp))` would take the image name and put a comma and then append `tmp` but the comma is the seperator for inside `tmp`. I also had at one point `tmp[-1].replace('0','1')` and was wondering why it also doesn't work but thanks now it's clear. – Vanq Nov 23 '22 at 23:30
  • @ CreepyRaccoon hey, I have been modifying your code for my use and it's been working great but now I might need some help. Now I have the case: `0.jpg 1,2,3,4,5` and on the next line `0.jpg 2,3,4,5,6` for example and I want to check if the first element is the same and concatenate it to make it look like `0.jpg 1,2,3,4,5 2,3,4,5,6`, I thought if someone would know it, you would :) – Vanq Dec 08 '22 at 20:37
  • Yes, but it's better to create a new question, I can't post the code here, is not one line – CreepyRaccoon Dec 08 '22 at 23:54
  • Thought so, but didn't think you'd see it then. I will create a new one. – Vanq Dec 09 '22 at 02:25
  • 1
    https://stackoverflow.com/questions/74738548/concatenate-text-file-lines-with-condition-in-python – Vanq Dec 09 '22 at 03:10