Combine CSV values into one row then append to new csv

Question

I've currently split my data into CSV files that need all rows of the column "sequence" combined into one string.

Each CSV looks something like this:

1773.csv
ID      Order    Sequence
1773     1        'AAGG'
1773     2        'TTGG'
1773     3        'GGAA'

1775.csv
ID      Order    Sequence
1775     1         'GGTT'
1775     2         'AAGT'
1775     3         'TGAA'

1331.csv
ID      Order    Sequence
1331     1         'CCGT'
1331     2         'CATT'
1331     3         'GTTA'

I need each CSV to merge each sequence row into one value like this:

ID       Sequence
1773   'AAGGTTGGGGAA'

Then make a master CSV of all the combined sequences from each CSV file.

Something like this:

ID        Sequence
1773    'AAGGTTGGGGAA'
1775    'GGTTAAGTTGAA'
1331    'CCGTCATTGTTA'

I wouldn't worry too much about the order column since the rows are already in order. +Each CSV in the folder is just it's ID

I've found this but it seems to combine all data from all csv files into a single cell/value:

def return_contents(file_name):
    with open(file_name) as infile:
        reader = csv.reader(infile)
        return list(reader)

all_files = os.listdir('C:\\Users\\CAAVR\\Desktop\\res_csv')

combined_output = []

for file in all_files:
    data = return_contents('C:\\Users\\CAAVR\\Desktop\\res_csv\\{}'.format(file))
    for row in data:
        combined_output.extend(row)

with open('csv_out.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(combined_output)

Thanks ahead of time and let me know if you need more info.

Well, that details what you _need_ but says nothing of what you _tried_ and the issues you faced. It also doesn't give a [mcve] because `"GGTT"` isn't in the input sequences — roganjosh, Mar 01 '20 at 22:00
I've added some code that I tried. I'm not 100% on your second comment though. If you are referring to the master csv that would be another combined sequence — Scott Valentine, Mar 01 '20 at 22:09
Thank you for the edit. The second part of my comment is that examples should be self-consistent. How could you expect `'GGTTAAGTTGAA'` from what's listed in the first code block? `"GGTT"` happened to be a sequence that stood out, but there are other examples. — roganjosh, Mar 01 '20 at 22:12

score 0 · Answer 1 · answered Mar 01 '20 at 22:11

0

I would use pandas for this task

pandas.read_csv("input.dat").groupby("id")['Sequence'].agg('sum').to_csv('output.dat')

I think would do what you ask

answered Mar 01 '20 at 22:11

Joran Beasley

110,522
12
160
179

i tried this on the non-split file and while it got rid of the extra ID's and order values it didn't seem to actually combine the strings. Just add another row of strings. Anyway to do this over the split strings and add them to a new file? – Scott Valentine Mar 01 '20 at 22:48
Any idea why this would give me a parser error on my main dataset? – Scott Valentine Mar 01 '20 at 23:48
you might need to specify delimiter or something – Joran Beasley Mar 02 '20 at 01:02

score 0 · Answer 2 · answered Mar 01 '20 at 23:05

0

I've found this but it seems to combine all data into a single row:
[...]
combined_output.extend(row)

extend adds all items from row individually to combined_output, losing the information that they once belonged together and creating a single flat list.

Instead, you should use append. It adds row to combined_output while keeping the items of row in their own list. The result is a list of lists, where each sublist represents one row in the output CSV.

See What is the difference between Python's list methods append and extend?.

answered Mar 01 '20 at 23:05

mkrieger1

19,194
5
54
65

This seems to give me an output csv of all files combined into one with each row combined into a single column. Instead of combining the rows of each individual csv into one row. Does that make sense? – Scott Valentine Mar 01 '20 at 23:16
"Instead of combining the rows of each individual csv into one row." I thought that was what the code you've shown was doing, which was what you did *not* want. – mkrieger1 Mar 01 '20 at 23:20
sorry for the confusion. From each split csv i need to combine each "sequence" value into one row. Then add that row to the list of all of them as an individual row. The code I had in the question would combine all rows from all csv files into one value. The append version still seems to combine the enire row into one value. Hope this clears it up. – Scott Valentine Mar 01 '20 at 23:31

Combine CSV values into one row then append to new csv

2 Answers2