I have a tab delimited file with \n EOL characters that looks something like this:
User Name\tCode\tTrack\tColor\tNote\n\nUser Name2\tCode2\tTrack2\tColor2\tNote2\n
I am taking this input file and reformatting it into a nested list using split('\t')
. The list should look like this:
[['User Name','Code','Track','Color','Note'],
['User Name2','Code2','Track2','Color2','Note2']]
The software that generates the file allows the user to press "enter" key any number of times while filling out the "Note" field. It also allows the user to press "enter" creating any number of newlines without entering any visible text in the "Note" field at all.
Lastly, the user may press "enter" any number of times in the middle of the "Note" creating multiple paragraphs, but this would be such a rare occurrence from the operational standpoint that I am willing to leave this eventuality not addressed if it complicates the code much. This possibility is really, really low priority.
As seen in the sample above, these actions can result in a sequence of "\n\n..." codes of any length preceding, trailing or replacing the "Note" field. Or to put it this way, the following replacements are required before I can place the file object into a list:
\t\n\n... preceding "Note" must become \t
\n\n... trailing "note" must become \n
\n\n... in place of "note" must become \n
\n\n... in the middle of the text note must become a single whitespace, if easy to do
I have tried using strip() and replace() methods without success. Does the file object need to be copied into something else first before replace() method can be used on it?
I have experience with Awk, but I am hoping Regular Expressions are not needed for this as I am very new to Python. This is the code that I need to improve in order to address multiple newlines:
marker = [i.strip() for i in open('SomeFile.txt', 'r')]
marker_array = []
for i in marker:
marker_array.append(i.split('\t'))
for i in marker_array:
print i