0

I have a list of keywords in a keywords.txt file. I have another file list.txt with the keywords in the beginning of each line. How can I sort the lines in list.txt to the same order they appear in keywords.txt?

keywords.txt

house
car
tree
woods
mailbox

list.txt

car bbdfbdfbdfbdf
tree gdfgvsgsgs
mailbox gsgsdfsdf
woods gsgsdgsdgsdgsdgsddsd
house gsdgfsdgsdgsdgsdg

final result in list.txt

house gsdgfsdgsdgsdgsdg    
car bbdfbdfbdfbdf
tree gdfgvsgsgs
woods gsgsdgsdgsdgsdgsddsd
mailbox gsgsdfsdf
Blainer
  • 2,552
  • 10
  • 32
  • 39
  • Are we talking Windows batch file scripting here? Or what scripting languages are okay (Python, Perl, Ruby, etc.)? – kiswa Apr 04 '12 at 17:05
  • I dont even know how to go about this. Windows batch would be fine if possible. – Blainer Apr 04 '12 at 17:06

3 Answers3

1
$ join -1 2 -2 1 <(cat -n keywords.txt | sort -k2) <(sort list.txt) | sort -k2n | cut -d ' ' -f 1,3-
house gsdgfsdgsdgsdgsdg
car bbdfbdfbdfbdf
tree gdfgvsgsgs
woods gsgsdgsdgsdgsdgsddsd
mailbox gsgsdfsdf
kev
  • 155,172
  • 47
  • 273
  • 272
  • You should test it in `linux`. (You didn't mention `Windows`) – kev Apr 04 '12 at 17:13
  • I have a ubuntu virtual machine. Im about to test this. – Blainer Apr 04 '12 at 17:14
  • I made a .sh and a .bsh with your script and i get this: root@ubuntu:~/Desktop/sort# bash script.sh script.sh: line 1: $: command not found cut: invalid byte or field list Try `cut --help' for more information. root@ubuntu:~/Desktop/sort# bash script.bsh cut: script.bsh: line 1: $: command not found invalid byte or field list Try `cut --help' for more information. – Blainer Apr 04 '12 at 17:24
  • You don't need `$`(it's a bash prompt) – kev Apr 05 '12 at 00:39
1

Here is an improved and simplified version of kiswa's answer.

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bl "%%A" list.txt
)>sorted.txt
REM move /y sorted.txt list.txt

The FINDSTR command only matches lines that begin with the keyword, and it forces the search to be a literal search. (FINDSTR could give the wrong result if the /L option is not specified and the keyword happens to contain a regex meta-character.)

The code to replace the original file with the sorted file is commented out. Simply remove the REM statement to activate the MOVE statement.

As with kiswa's answer, the above will only output lines from list.txt that match a keyword in keywords.txt.

You might have lines in list.txt that do not match a keyword. If you want to preserve those lines at the bottom of the sorted output, then use:

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bli "%%A" "list.txt"
  findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt

Note that the /I (case insensitive) option must be used because of a FINDSTR bug dealing with multiple literal search strings of different lengths. The /I option avoids the bug, but it would cause problems if your keywords are case sensitive. See What are the undocumented features and limitations of the Windows FINDSTR command?.

You might have keywords that are missing from list.txt. If you want to include those keywords without any data following them, then use:

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bl "%%A" "list.txt" || echo %%A
)>sorted.txt
::move /y sorted.txt list.txt

Obviously you can combine both techniques to make sure you preserve the union of both files:

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bli "%%A" "list.txt" || echo %%A
  findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt

All of the above assume the keywords do not contain space or tab characters. If they do, then the FOR /F options and FINDSTR options must change:

@echo off
(
  for /f "usebackq delims=" %%A in ("keywords.txt") do findstr /bic:"%%A" "list.txt" || echo %%A
  findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
0

Here's a Windows batch file. It's probably not the most efficient, but I think it's nicely readable.

@echo off

for /F "tokens=*" %%A in (keywords.txt) do (
    for /F "tokens=*" %%B in ('findstr /i /C:"%%A" list.txt') do (
        echo %%B >> sorted.txt
    )
)

del list.txt

rename sorted.txt list.txt

This creates a sorted file, then removes the list file and renames the sorted file.

kiswa
  • 14,737
  • 1
  • 21
  • 29
  • this deletes some of my lines in the final sorted file. I start with 46 lines and end with 38. I can send you my list and keywords files if need be. – Blainer Apr 04 '12 at 18:06
  • It will only work if the lines all match a keyword in the sort. If you want unsorted items to stay in the list, that's a different thing than you asked originally. Also, empty lines will be removed. – kiswa Apr 04 '12 at 18:08
  • All the lines match a keyword. Empty lines and unmatched lines should not be an issue...I have 46 lines, 46 keywords. here are my files http://www.mediafire.com/?xbuzhe245i8nimo – Blainer Apr 04 '12 at 18:11
  • For some reason the `findstr` command in Windows will not locate these lines in your `list.txt` file: `ZZZYYYJesus-Christ.html ZZZYYYDeity-Christ.html ZZZYYYwhy-believe-resurrection.html ZZZYYYJesus-crucified.html ZZZYYYJesus-Jew.html ZZZYYYJesus-myth.html ZZZYYYnames-Jesus-Christ.html ZZZYYYstations-cross.html` – kiswa Apr 04 '12 at 18:38
  • yea, do you have andy idea why this is? – Blainer Apr 04 '12 at 18:42
  • I can't access the mediafire site from my current location, so I can't be sure. But perhaps there are spaces (or other hidden characters) in the keywords.txt file that are preventing the match. If so, then removing tokens=* from the 1st FOR statement should solve the problem. – dbenham Apr 04 '12 at 19:21