0

I have two files in a directory that are both the .txt files with one word on each line for many lines. I need them to be merged, and then the new file to be alphabetized.

I've done this in PHP, but how can I do it in Python 2.7?

<?php
$files = glob("./files/*.??");
$out = fopen("listTogether.txt", "w");
foreach($files as $file){
    fwrite($out, file_get_contents($file));
}
fclose($out);
?>
daiuto
  • 486
  • 2
  • 10
  • 20
Django Johnson
  • 1,383
  • 3
  • 21
  • 40

1 Answers1

6

Read all inputfiles into one list, sort the result and write out the lines again:

from itertools import chain
from glob import glob

lines = list(chain.from_iterable(open(f, 'r') for f in glob('./files/*.??')))
lines.sort()

with open('listTogether.txt', 'w') as out:
    out.writelines(lines)

If your files are large however, you want to sort the files separately, write out the sorted results, then merge the sorted files into the new output file, line by line, using a merge generator function.

You appear to be working with Windows files, which use \r\n (carriage return plus linefeed) line endings; you could use universal lineending support and open the files with 'rU' mode to always give you \n line endings:

lines = list(chain.from_iterable(open(f, 'rU') for f in glob('./files/*.??')))
lines.sort()

with open('listTogether.txt', 'w') as out:
    out.writelines(lines)

For more details on the U mode character, see the open() function call.

To remove any duplicates, you'd create a set instead of a list, then use sorted() to write out a sorted sequence again:

lines = set(chain.from_iterable(open(f, 'rU') for f in glob('./files/*.??')))

with open('listTogether.txt', 'w') as out:
    out.writelines(sorted(lines))
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Where does ``glob`` come from? – Fabian Jun 08 '13 at 00:43
  • 2
    @Fabian: from the import that was there all along, I swear! :-P – Martijn Pieters Jun 08 '13 at 00:44
  • I get a blank text file called listTogether.txt . Where is the contents of the text files. – Django Johnson Jun 08 '13 at 01:06
  • @DjangoJohnson: Make sure the `glob()` call actually matched files; you are using a relative path; are you certain that you are running the code in the same directory as those files? – Martijn Pieters Jun 08 '13 at 01:08
  • ah, yes. Now it is working. But I am getting `` at the end of each line. I believe that stands for carraige return. How do I get rid of those? – Django Johnson Jun 08 '13 at 01:19
  • @DjangoJohnson: What are you using to open the files? Are you running this on Windows? Python does use automatic line-end translations based on the current platform, so on Windows it'd write `\r\n` line endings. – Martijn Pieters Jun 08 '13 at 01:22
  • @DjangoJohnson: you can open the files in binary mode, `'rb'` and `'wb'` to not have line-endings translated for you. – Martijn Pieters Jun 08 '13 at 01:23
  • Thank you. It works. Can you edit the answer for others who read it, please. – Django Johnson Jun 08 '13 at 01:28
  • @DjangoJohnson: that is rather a corner case; where you are processing text files on the Windows platform but your input files are not using native line endings.. – Martijn Pieters Jun 08 '13 at 01:29
  • It is actually still printing `` at the end of each line still. I thought it wasn't because I was viewing it in preview, but when I opened listTogether.txt in a text editor, I saw the ``s. I am doing this on Mac OS X not Windows... – Django Johnson Jun 08 '13 at 01:39
  • @DjangoJohnson: check your *input* files then. Python is not adding those `\r` characters, certainly not on OS X. – Martijn Pieters Jun 08 '13 at 01:41
  • Nope, no ``s in the input files. [Here](http://pastebin.com/EC9WKUpm) is the python file I am running in terminal as python edit.py and there are only two files in /files/ [names.txt](http://the-irf.com/names.txt) and [american.txt](http://the-irf.com/american.txt) – Django Johnson Jun 08 '13 at 01:49
  • @DjangoJohnson: The input files **do** have `\r\n` line endings. In your terminal, run `file american.txt` and see: *american.txt: ASCII text, with CRLF line terminators*. – Martijn Pieters Jun 08 '13 at 01:51
  • @DjangoJohnson: My code simply processes the lines *as read*; Python has no reason to add carriage returns on OS X, where the native line ending is `\n`. – Martijn Pieters Jun 08 '13 at 01:53
  • oh, they weren't showing up in my text editor so I didn't think they were in either of the files. How can I remove them from american.txt then? – Django Johnson Jun 08 '13 at 01:59
  • @DjangoJohnson: See http://blog.shvetsov.com/2012/04/covert-unix-windows-mac-line-endings.html or [Converting newline formatting from Mac to Windows](http://stackoverflow.com/q/6373888). – Martijn Pieters Jun 08 '13 at 02:01
  • @DjangoJohnson: You could also read the files with universal line-ending support; use `'rU'` to open the files. – Martijn Pieters Jun 08 '13 at 02:15
  • Thank you. One last request. How can I remove duplicates from the lines. For example if one line has amazing and another line in one of the other files also has amazing, when merged and alphabetized, they will be next to each other. How can I make it so that there is only one. In PHP, I would do this by `array_unique()` before writing to the file. – Django Johnson Jun 08 '13 at 02:20
  • @DjangoJohnson: added that to my answer; `set()` creates a set type; these can only hold unique values. – Martijn Pieters Jun 08 '13 at 02:28