I have a large TXT dataset that is delimited by |
but there is a field that allows for paragraph text, which contains line breaks and blank lines. All lines that are not part of the paragraph text start with AA|
. When I try to import into R via readr
these values become NA because it doesn't follow the structure
Is there a way to use sed
or awk
to take a line if it doesn't start with AA|
then to append it to the prior line that does with a space?
Input:
AA|5904060|9001084471200270|9000263372600200|Result Comment:
No (1, 3) Beta-D-Glucan detected.
This assay does not detect certain fungi, including
Cryptococcus species, which produce very low levels of (1,
3) Beta-D-Glucan (BDG) and the Mucorales (e.g., Lichthemia,
Mucor and Rhizopus), which are not known to produce BDG.
Additionally, the yeast phase of Blastomyces dermatitidis
produces little BDG and may not be detected by this assay.
|North Building|0|0
Goal Output:
AA|5904060|9001084471200270|9000263372600200|Result Comment: No (1, 3) Beta-D-Glucan detected. This assay does not detect certain fungi, including Cryptococcus species, which produce very low levels of (1, 3) Beta-D-Glucan (BDG) and the Mucorales (e.g., Lichthemia, Mucor and Rhizopus), which are not known to produce BDG. Additionally, the yeast phase of Blastomyces dermatitidis produces little BDG and may not be detected by this assay.|North Building|0|0