I have a CSV file that I am trying to parse but the problem is that one of the cells contains blocks of data full of nulls and line breaks. I need enclose each row inside an array and merge all the content from this particular cell within its corresponding row. I recently posted and similar question and the answer solved my problem partially, but I am having problems building a loop that iterates through every single line that does not meet a certain start condition. The code that I have merges only the first line that does not meet that condition, but it breaks after that.
I have:
file ="myfile.csv"
condition = "DAT"
data = open(file).read().split("\n")
for i, line in enumerate(data):
if not line.startswith(condition):
data[i-1] = data[i-1]+line
data.pop(i)
print data
For a CSV that looks like this:
Case | Info
-------------------
DAT1 single line
DAT2 "Berns, 17, died Friday of complications from Hutchinson-Gilford progeria syndrome, commonly known as progeria. He was diagnosed with progeria when he was 22 months old. His physician parents founded the nonprofit Progeria Research Foundation after his diagnosis.
Berns became the subject of an HBO documentary, ""Life According to Sam."" The exposure has brought greater recognition to the condition, which causes musculoskeletal degeneration, cardiovascular problems and other symptoms associated with aging.
Kraft met the young sports fan and attended the HBO premiere of the documentary in New York in October. Kraft made a $500,000 matching pledge to the foundation.
The Boston Globe reported that Berns was invited to a Patriots practice that month, and gave the players an impromptu motivational speech.
DAT3 single line
DAT4 YWYWQIDOWCOOXXOXOOOOOOOOOOO
It does join the full sentence with the previous line. But when it hits a double space or double line it fails and registers it as a new line. For example, if I print:
data[0]
The output is:
DAT1 single line
If I print:
data[1]
The output is:
DAT2 "Berns, 17, died Friday of complications from Hutchinson-Gilford progeria syndrome, commonly known as progeria. He was diagnosed with progeria when he was 22 months old. His physician parents founded the nonprofit Progeria Research Foundation after his diagnosis.
But if I print:
data[2]
The output is:
Berns became the subject of an HBO documentary, ""Life According to Sam."" The exposure has brought greater recognition to the condition, which causes musculoskeletal degeneration, cardiovascular problems and other symptoms associated with aging.
Instead of:
DAT3 single line
How do I merge that full bull of text on the column "Info" so that it always matches the corresponding DAT row instead on popping as a new row, regardless of null or new line characters?