In your case, I imagine a Minimal, Reproducible Example might look something like:
I have this input CSV:
Key,Summary,Comments
Issue-1,Foo doesn't Bar,"Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz."
,,"Blake - 2/Jan/23 - I can reproduce, testing now."
,,"Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm."
,,"Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!"
Issue-2,1+1=3 ?!,"Charlie - 2/Jan/23 - 1+1=2, right?"
,,"Blake - 3/Jan/23 - Correct, investigating"
,,"Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm."
,,"Charlie - 5/Jan/23 - Yep, 1+1 now equal 2."
Issue-3,Fix it!,"Daniel - 3/Jan/23 - It's broken."
,,"Blake - 3/Jan/23 - Not enough info to reproduce. Closed."
When I run my code, I want it to look like ...
But, we don't know really know what you expect it to look like. I'm going to infer based on your code and description of the problem that you want to turn any number of rows for a single issue into a single row.
Something like?
Key,Summary,Comments
Issue-1,Foo doesn't Bar,"Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz.","Blake - 2/Jan/23 - I can reproduce, testing now.","Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm.","Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!"
Issue-2,1+1=3 ?!,"Charlie - 2/Jan/23 - 1+1=2, right?","Blake - 3/Jan/23 - Correct, investigating","Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm.","Charlie - 5/Jan/23 - Yep, 1+1 now equal 2."
Issue-3,Fix it!,Daniel - 3/Jan/23 - It's broken.,Blake - 3/Jan/23 - Not enough info to reproduce. Closed.
(or)
Key |
Summary |
Comments |
|
|
|
Issue-1 |
Foo doesn't Bar |
Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz. |
Blake - 2/Jan/23 - I can reproduce, testing now. |
Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm. |
Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good! |
Issue-2 |
1+1=3 ?! |
Charlie - 2/Jan/23 - 1+1=2, right? |
Blake - 3/Jan/23 - Correct, investigating |
Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm. |
Charlie - 5/Jan/23 - Yep, 1+1 now equal 2. |
Issue-3 |
Fix it! |
Daniel - 3/Jan/23 - It's broken. |
Blake - 3/Jan/23 - Not enough info to reproduce. Closed. |
|
|
Even without knowing exactly what you want, and looking at this bit of code, I can address the two issues you brought up:
i = 0
for line in new_data:
i += 1
if new_data[i][0] == "":
new_data[i-1].extend(new_data[i])
del new_data[i]
i -= 1
there is a ton of blank values between each comment
All the "follow-on rows" have their leading fields blank up the Comments column. When you extend the "starting row" with each subsequent row, you're adding each additional row as a whole (with all the blank leading fields), like:
l = ['a','b'] # "starting" row
l.extend(['','','','c','d']) # 2nd (follow-on) row
l.extend(['','','','e','f']) # 3rd (follow-on) row
print(l) # ['a','b','','','','c','d','','','','e','f']
If you're only after one field in each follow-on row, then just append that single field:
l = ['a','b']
l.append(['','','','c','d'][3])
l.append(['','','','e','f'][3])
print(l) # ['a','b','c','e']
Otherwise, narrow down the row to the fields you want (with "slice notation") in the extend method, like:
l = ['a','b']
l.extend(['','','','c','d'][3:])
l.extend(['','','','e','f'][3:])
print(l) # ['a','b','c','d','e','f']
It goes until about halfway through the csv, then just stops working
I cannot exactly say what's going on, but your increment/decrement logic is off. Also, mixing the concpets of iterating-by-item then deleting-by-index seems like a recipe for wrongness.
Have you seen this community post, How to remove items from a list while iterating? The top posts advocate for incrementally adding what you want (effectively removing what you don't want); but that doesn't exactly work for you. The post that actually directly answers the question shows iterating backwards and deleting from the end, which also doesn't work for your use case.
Still, you can take the top ideas of "building foward" and apply it by creating a new, empty list and appending a row from data and setting the new index for new data if it's the start row, or add the follow-on row's fields with the index:
with open("input.csv", newline="") as f:
reader = csv.reader(f)
header = next(reader)
data = list(reader)
new_data = []
idx = -1
for row in data:
if row[0] != "": # start row
new_data.append(row)
idx += 1
else: # follow-on row
new_data[idx].append(row[2]) # targeting single field (row[2]), so append, not extend
That gives me (using the sample data from the table, above, as input.csv):
[
[
"Issue-1",
"Foo doesn't Bar",
"Alice - 1/Jan/23 - When I Foo I expect Bar, but all I get is Baz.",
"Blake - 2/Jan/23 - I can reproduce, testing now.",
"Blake - 3/Jan/23 - Pushed fix. @Alice, please confirm.",
"Alice - 4/Jan/23 - Confirmed fix, Foo()=Bar, all good!",
],
[
"Issue-2",
"1+1=3 ?!",
"Charlie - 2/Jan/23 - 1+1=2, right?",
"Blake - 3/Jan/23 - Correct, investigating",
"Blake - 4/Jan/23 - Pushed fix. @Charlie, please confirm.",
"Charlie - 5/Jan/23 - Yep, 1+1 now equal 2.",
],
[
"Issue-3",
"Fix it!",
"Daniel - 3/Jan/23 - It's broken.",
"Blake - 3/Jan/23 - Not enough info to reproduce. Closed.",
],
]
If I've completely missed the intended outcome, please edit your post to include the sample input CSV and the expected output CSV. Good luck!