1

EDIT: See bottom of post for the entire code

I am new to this forum and I have an issue that I would be grateful for any help solving.

Situation and goal:
- I have a list of strings. Each string is one word, like this: ['WORD', 'LINKS', 'QUOTE' ...] and so on.
- I would like to write this list of words (strings) on separate lines in a new text file.
- One would think the way to do this would be by appending the '\n' to every item in the list, but when I do that, I get a blank line between every list item. WHY?

Please have a look at this simple function:

def write_new_file(input_list):
    with open('TEKST\\TEKST_ny.txt', mode='wt') as output_file: 
        for linje in input_list:
            output_file.write(linje + '\n')

This produces a file that looks like this:

WORD

LINKS

QUOTE

If I remove the '\n', then the file looks like this:

WORDLINKSQUOTE 

Instead, the file should look like this:

WORD   
LINKS   
QUOTE

I am obviously doing something wrong, but after a lot of experimenting and reading around the web, I can't seem to get it right.

Any help would be deeply appreciated, thank you!

Response to link to thread about write() vs. writelines(): Writelines() doesn't fix this by itself, it produces the same result as write() without the '\n'. Unless I add a newline to every list item before passing it to the writelines(). But then we're back at the first option and the blank lines...

I tried to use one of the answers in the linked thread, using '\n'.join() and then write(), but I still get the blank lines.

It comes down to this: For some reason, I get two newlines for every '\n', no matter how I use it. I am .strip()'ing the list items of newline characters to be sure, and without the nl everything is just one massive block of texts anyway.

On using another editor: I tried open the txt-file in windows notepad and in notepad++. Any reason why these programs wouldn't display it correctly?

EDIT: This is the entire code. Sorry for the Norwegian naming. The purpose of the program is to read and clean up a text file and return the words first as a list and ultimately as a new file with each word on a new line. The text file is a list of Scrabble-words, so it's rather big (9 mb or something). PS: I don't advocate Scrabble-cheating, this is just a programming exercise :)

def renskriv(opprinnelig_ord):
    nytt_ord = ''
    for bokstav in opprinnelig_ord:
        if bokstav.isupper() == True:
            nytt_ord = nytt_ord + bokstav
    return nytt_ord

def skriv_ny_fil(ny_liste):
    with open('NSF\\NSF_ny.txt', 'w') as f: 
        for linje in ny_liste: 
            f.write(linje + '\n')


def behandle_kildefil():
    innfil = open('NSF\\NSF_full.txt', 'r')
    f = innfil.read()
    kildeliste = f.split()
    ny_liste = []
    for item in kildeliste:
        nytt_ord = renskriv(item)
        nytt_ord = nytt_ord.strip('\n')
        ny_liste.append(nytt_ord)
    skriv_ny_fil(ny_liste)
    innfil.close()

def main():
    behandle_kildefil()

if __name__ == '__main__':
    main()
Rod
  • 52,748
  • 3
  • 38
  • 55
editionagenda
  • 13
  • 1
  • 4
  • 2
    use `writelines` and see [this question](http://stackoverflow.com/questions/12377473/python-write-versus-writelines-and-concatenated-strings). – sobolevn May 21 '15 at 11:11
  • try opening `TEKST_ny.txt` in some other editor ? – ZdaR May 21 '15 at 11:13
  • The \n jumps to the next line and a new line is then started from there giving you a gap – kezzos May 21 '15 at 11:15
  • 1
    What OS, and what text editor are you using? It looks like you are on windows and python is writing `"\r\n"` for each new line character (because that's what a new line is on Windows), but your text editor is unix-based and is interpreting both `\r` and `\n` as new line characters. – Dunes May 21 '15 at 11:30
  • if you do `with open('TEKST\\TEKST_ny.txt') as f: print(repr(f.read())`, what is printed? – Dunes May 21 '15 at 11:33
  • Dunes: That would make sense. Thank you. I am on Windows 7 Enterprise, using Notepad++. Do you have any suggestion what I could do to compensate this? – editionagenda May 21 '15 at 11:34
  • I tried your code on Linux / Python 2.7.x and I *of course* don't have any added newlines. I think Dunes' comment is on the right track... – bruno desthuilliers May 21 '15 at 11:35
  • Thank you bruno, and Dunes: I get a syntax error on the next part of the code. Can't understand why, I will try to sort it out first – editionagenda May 21 '15 at 11:40
  • Notepad++ is usually very good at figuring out if a text file uses unix-based or windows-based new lines. I guessing maybe the file uses mixed new line types. In Notepad++ go to View -> Show Symbol -> Show End of Line. Windows newlines look like [CR][LF], and Unix will just be [LF]. If you see a mix of these then Notepad++ is confused and has opted to treat both [CR] and [LF] as new lines (as [CR] is a new line on OSX). – Dunes May 21 '15 at 11:43
  • linje = linje.strip() does the trick – jester112358 May 21 '15 at 11:50
  • Thank you, Dunes. Running your code produces the list words with an à between in MS Console, but more importantly: When I inspect the txt-files with visible End line-symbols in N++, it appears to be TWO [CR] between every entry, and none when I skip the '\n' in the code. – editionagenda May 21 '15 at 11:52
  • I must be doing something wrong. The code in the original post is cleaned up a bit for posting-purpose. I will update the original post with the entire code, a lot of the names in Norwegian (sorry), to be sure that I am not leaving out anything – editionagenda May 21 '15 at 11:54
  • Unless you need non-ascii characters, a quick fix would be to open the file in binary mode rather than text mode. eg `open(filename, "wb")`. This will prevent python from interpreting newline characters and just write your strings as is. I have no idea what would be causing python to write two [CR] (Carriage Return / `\r`) characters for each newline. I suggest you edit your question title or ask a new one to get more specialist help as this is beyond my knowledge. – Dunes May 21 '15 at 12:00
  • Ok, but thank you anyway for taking the time to look at this! – editionagenda May 21 '15 at 12:13

2 Answers2

2

I think there must be some '\n' among your lines, try to skip empty lines. I suggest you this code.

def write_new_file(input_list):
    with open('TEKST\\TEKST_ny.txt', 'w') as output_file: 
        for linje in input_list:
            if not linje.startswith('\n'):
                output_file.write(linje.strip() + '\n')
alec_djinn
  • 10,104
  • 8
  • 46
  • 71
0

You've said in the comments that python is writing two carriage return ('\r') characters for each line feed ('\n') character you write. It's a bit bizaare that python is replacing each line feed with two carriage returns, but this is a feature of opening a file in text mode (normally the translation would be to something more useful). If instead you open your file in binary mode then this translation will not be done and the file should display as you wish in Notepad++. NB. Using binary mode may cause problems if you need characters outside the ASCII range -- ASCII is basically just latin letters (no accents), digits and a few symbols.

For python 2 try:

filename = "somefile.txt"
with open(filename, mode="wb") as outfile:
    outfile.write("first line")
    outfile.write("\n")
    outfile.write("second line")

Python 3 will be a bit more tricky. For each string literal you wish you write you must prepend it with a b (for binary). For each string you don't have immediate access to, or don't wish to change to a binary string, then you must encode it using the encode() method on the string. eg.

filename = "somefile.txt"
with open(filename, mode="wb") as outfile:
    outfile.write(b"first line")
    outfile.write(b"\n")
    some_text = "second line"
    outfile.write(some_text.encode())
Dunes
  • 37,291
  • 7
  • 81
  • 97
  • Thank you! This is really interesting and I will absolutely look into it. I know it's a bit off the topic of your reply, but do you know if I can identify the carriage return in the code? `'\r'`? Perhaps I could find a way to strip every second occurence in the file. There are several hundred thousand entries so I can't do it manually. Again, thank you for taking the time with this – editionagenda May 21 '15 at 13:02
  • Yes, `'\r'` is a carriage return. A carriage return is whitespace and the function `str.strip` will remove it. You can remove the double newlines in notepad++ by choosing "Search -> Replace...", and setting "find what" as `\r\r` and setting "replace with" to either `\r\n` (for windows-style newlines) or `\n` (for unix-style newlines). – Dunes May 21 '15 at 14:09