0

I have a text file that looks like this

Apple TreeTwo
Banana TreeOne
Juice TreeOne
Pineapple TreeThree
Berries TreeThree

How can I select the rows with the same Tree name and put them in separate files like below in python

file1.txt
Banana TreeOne
Juice TreeOne

file2.txt
Apple TreeTwo

file3.txt
Pineapple
Berries

I've tried using this "https://stackoverflow.com/questions/72065988/how-to-select-all-rows-with-the-same-name-but-different-values-in-python" but getting no attribute groupby error. My column don't have headers, so don't know if this is how to do it or is there another way?

f = open('data.txt' , 'r')
f_splits = [v for k, v in f.groupby()]
for f_split in f_splits:
    print(f_split, sep = '\n')
nora job
  • 35
  • 4

1 Answers1

5

I wouldn't actually use groupby here, simply iterating over the file contents and then separating it into lists is easier.

Note that you could optimize this into a single for loop, but I'm not, to make it more understandable...

I'm using a dict in my example below as it is able to deal with unknown values easily.

data = """Apple TreeTwo
Banana TreeOne
Juice TreeOne
Pineapple TreeThree
Berries TreeThree"""

result = {}
for line in data.splitlines():
    # get the last word to determine which list to put it in
    sort_key = line.split()[-1]
    # remove the last word, by splitting and rejoining.
    line = " ".join(line.split()[:-1])
    if sort_key not in result:
        # if the key is not already in the dict, create a new list with 
        # line as the first element
        result[sort_key] = [line]
    else:
        # if the key is already there, append line to the list
        result[sort_key].append(line)

#print it out
for key, value in result.items():
    print(f"{key} → {value}")


# write to files
for key, value in result.items():
    with open(f"{key}.txt", "w") as outfile:
        for line in value:
            outfile.write(f"{line}\n")

output

TreeTwo → ['Apple']
TreeOne → ['Banana', 'Juice']
TreeThree → ['Pineapple', 'Berries']
Edo Akse
  • 4,051
  • 2
  • 10
  • 21
  • Is there a way I can outfile.write only the value ie ['Apple'] ['Banana Juice] without the Treename? – nora job Nov 11 '22 at 18:51
  • yes, I've written the full line as that was what you put in the example. If the last word is the only one you want to remove, it's a simple adjustment. See my updated answer, which basically removes the last word from each line – Edo Akse Nov 11 '22 at 19:38
  • note that if the lines always only have 2 words, you can simplify the two lines that define `sort_key` and `line` using the splits, with the following: `line, sort_key = line.split()` – Edo Akse Nov 11 '22 at 19:40