I am having a problem concerning newline characters and return characters. Ugh this is hard to explain for me, but I will try.
I have data that exists in list form. The members of the list have newline characters in them such that.
example_list = ["I've always loved jumping\n\n"]
In order to tokenize this sentence using NLP though NLTK I need the sentence to be in a string. NLTK will ignore newline characters and other escape characters when it tokenizes according to some tests I ran and evidence from the nltk tutorial.
The problem is when i try to convert example_list to a string i get this output
str(example_list)
'["I\'ve always loved jumping\\n\\n"]'
Notice that all newline characters have now become an escaped forward slash. Trying to tokenize this yields a terrible result where NLTK thinks that jumping\n\n is one big word because it thinks that the newline characters with two slashes are actually text.
Does anyone know any tricks or good practices to ensure that newline characters never exist in my lists or that these are disregarded or not "double escaped" when converting to a string.
Lastly, Does anyone have any suggestions on learning material relating to how python processes newline characters and how these characters interact with different datatypes and such because it is so confusing.
Thanks a ton!