Combining rows in CSV while removing null values

Question

I'm really new to python. Thanks in advance for the help.

I am the only one on my team that has any scripting experience, so I've been asked to write a script to combine rows in a csv. For more information, we are moving from on prem Jira to Jira Cloud. And we're using csv to transfer the data.

Below is the code I've written. It's clunky to say the least, but it mostly works.

import csv

portal_data = "Pathward Data.csv"
new_portal_data = "Combined Pathward Data.csv"

with open(portal_data, 'r') as f:
    reader = csv.reader(f)
    data = list(reader)

new_data = []
for line in data:
    new_line = line.copy()
    new_data.append(new_line)

i = 0
for line in new_data:
    i += 1
    if new_data[i][0] == "":
        new_data[i-1].extend(new_data[i])
        del new_data[i]
        i -= 1

with open(new_portal_data, 'w') as nf:
    writer = csv.writer(nf)
    writer.writerows(new_data)

print("Done")

This image is what the CSV looks like before the code. With the jira tool we're using, it inserts each new comment into a new row. The first column is the project name, and is blank in each row that is an additional comment for the ticket above it. Each comment on a new ticket has a time stamp on it, each additional comment does not.

After running the code, the csv combines a lot of the rows, though there is a ton of blank values between each comment. I'm not sure how to get rid of those. I've tried a lot of things, so any help would be great.

But this also doesn't work on every row. It goes until about halfway through the csv, then just stops working. I'm not sure why.

I've included a test data set in Replit with the code. https://replit.com/join/wicvgdpemc-stephenbyrd1

While the code you pasted is a clear and short snippet, without sample data, it doesn't really count as a [minimal, *reproducible* example](https://stackoverflow.com/help/minimal-reproducible-example). A basic example of the content from the CSV files you are working with would be helpful for us to be able to answer your question. — dskrypa, Feb 16 '23 at 22:47
Thank you. I have included my test files in a Replit link. Is this enough or do you have any other advice on how to help in this situation? — Stephen Byrd, Feb 16 '23 at 23:16
Include a small data sample, in text format, in your question. Also show us your current and expected results corresponding to that data input. Please read the link in the comment above. — AlexK, Feb 17 '23 at 00:55
Good try so far. I'll show what an MRE is for your question and attempt to address some issues I see in your code so far. — Zach Young, Feb 20 '23 at 21:23

score 0 · Answer 1 · answered Feb 20 '23 at 21:23

In your case, I imagine a Minimal, Reproducible Example might look something like:

I have this input CSV:

Key,Summary,Comments
Issue-1,Foo doesn't Bar,"Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz."
,,"Blake - 2/Jan/23 - I can reproduce, testing now."
,,"Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm."
,,"Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!"
Issue-2,1+1=3 ?!,"Charlie - 2/Jan/23 - 1+1=2, right?"
,,"Blake - 3/Jan/23 - Correct, investigating"
,,"Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm."
,,"Charlie - 5/Jan/23 - Yep, 1+1 now equal 2."
Issue-3,Fix it!,"Daniel - 3/Jan/23 - It's broken."
,,"Blake - 3/Jan/23 - Not enough info to reproduce. Closed."

When I run my code, I want it to look like ...

But, we don't know really know what you expect it to look like. I'm going to infer based on your code and description of the problem that you want to turn any number of rows for a single issue into a single row.

Something like?

Key,Summary,Comments
Issue-1,Foo doesn't Bar,"Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz.","Blake - 2/Jan/23 - I can reproduce, testing now.","Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm.","Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!"
Issue-2,1+1=3 ?!,"Charlie - 2/Jan/23 - 1+1=2, right?","Blake - 3/Jan/23 - Correct, investigating","Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm.","Charlie - 5/Jan/23 - Yep, 1+1 now equal 2."
Issue-3,Fix it!,Daniel - 3/Jan/23 - It's broken.,Blake - 3/Jan/23 - Not enough info to reproduce. Closed.

(or)

Key	Summary	Comments
Issue-1	Foo doesn't Bar	Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz.	Blake - 2/Jan/23 - I can reproduce, testing now.	Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm.	Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!
Issue-2	1+1=3 ?!	Charlie - 2/Jan/23 - 1+1=2, right?	Blake - 3/Jan/23 - Correct, investigating	Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm.	Charlie - 5/Jan/23 - Yep, 1+1 now equal 2.
Issue-3	Fix it!	Daniel - 3/Jan/23 - It's broken.	Blake - 3/Jan/23 - Not enough info to reproduce. Closed.

Even without knowing exactly what you want, and looking at this bit of code, I can address the two issues you brought up:

i = 0
for line in new_data:
    i += 1
    if new_data[i][0] == "":
        new_data[i-1].extend(new_data[i])
        del new_data[i]
        i -= 1

there is a ton of blank values between each comment

All the "follow-on rows" have their leading fields blank up the Comments column. When you extend the "starting row" with each subsequent row, you're adding each additional row as a whole (with all the blank leading fields), like:

l = ['a','b']                    # "starting" row
l.extend(['','','','c','d'])     # 2nd (follow-on) row
l.extend(['','','','e','f'])     # 3rd (follow-on) row
print(l)                         # ['a','b','','','','c','d','','','','e','f']

If you're only after one field in each follow-on row, then just append that single field:

l = ['a','b']
l.append(['','','','c','d'][3])
l.append(['','','','e','f'][3])
print(l)                         # ['a','b','c','e']

Otherwise, narrow down the row to the fields you want (with "slice notation") in the extend method, like:

l = ['a','b']
l.extend(['','','','c','d'][3:])
l.extend(['','','','e','f'][3:])
print(l)                         # ['a','b','c','d','e','f']

It goes until about halfway through the csv, then just stops working

I cannot exactly say what's going on, but your increment/decrement logic is off. Also, mixing the concpets of iterating-by-item then deleting-by-index seems like a recipe for wrongness.

Have you seen this community post, How to remove items from a list while iterating? The top posts advocate for incrementally adding what you want (effectively removing what you don't want); but that doesn't exactly work for you. The post that actually directly answers the question shows iterating backwards and deleting from the end, which also doesn't work for your use case.

Still, you can take the top ideas of "building foward" and apply it by creating a new, empty list and appending a row from data and setting the new index for new data if it's the start row, or add the follow-on row's fields with the index:

with open("input.csv", newline="") as f:
    reader = csv.reader(f)
    header = next(reader)
    data = list(reader)

new_data = []
idx = -1
for row in data:
    if row[0] != "":                  # start row
        new_data.append(row)
        idx += 1
    else:                             # follow-on row
        new_data[idx].append(row[2])  # targeting single field (row[2]), so append, not extend

That gives me (using the sample data from the table, above, as input.csv):

[
    [
        "Issue-1",
        "Foo doesn't Bar",
        "Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz.",
        "Blake - 2/Jan/23 - I can reproduce, testing now.",
        "Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm.",
        "Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!",
    ],
    [
        "Issue-2",
        "1+1=3 ?!",
        "Charlie - 2/Jan/23 - 1+1=2, right?",
        "Blake - 3/Jan/23 - Correct, investigating",
        "Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm.",
        "Charlie - 5/Jan/23 - Yep, 1+1 now equal 2.",
    ],
    [
        "Issue-3",
        "Fix it!",
        "Daniel - 3/Jan/23 - It's broken.",
        "Blake - 3/Jan/23 - Not enough info to reproduce. Closed.",
    ],
]

If I've completely missed the intended outcome, please edit your post to include the sample input CSV and the expected output CSV. Good luck!

Combining rows in CSV while removing null values

1 Answers1