Remove orphan lines from file

Question

I use fdupes to list duplicate files. For example:

./topic/org-batch/.svn/pristine/45/45e578cf6c4723c6853b788e6ae35c1705fe9b19.svn-base
./topic/org-batch/Makefile

./topic/org-batch/lisp/sword-mode.el
./home/.emacs.d/lisp/sword-mode.el
./home/.emacs.d/lisp/sword-mode-bak.el

./topic/org-batch/.svn/pristine/2a/2a87de13f3959748235f2a9735b0d7da40ef8545.svn-base
./topic/org-batch/bin/orgmk-stow-orgmk.mk

./home/.emacs.d/elpa/auctex-12.1.1/COPYING
./home/.emacs.d/elpa/org-plus-contrib-20180813/COPYING

./topic/org-batch/.svn/pristine/1e/1ebac4e8f3174f5da74469ad0bf5714ed901233e.svn-base
./topic/org-batch/bin/orgmk-init

Though, some the above (the copies in SVN) are normal duplicate files.

Hence, I grep out those legitimate copies from .git or .svn directories:

fdupes -r . \
    | grep -v "/.svn/" \
    | grep -v "/.git/" \
    | uniq

But I come with some isolated lines in the report:

./topic/org-batch/Makefile

./topic/org-batch/lisp/sword-mode.el
./home/.emacs.d/lisp/sword-mode.el
./home/.emacs.d/lisp/sword-mode-bak.el

./topic/org-batch/bin/orgmk-stow-orgmk.mk

./home/.emacs.d/elpa/auctex-12.1.1/COPYING
./home/.emacs.d/elpa/org-plus-contrib-20180813/COPYING

./topic/org-batch/bin/orgmk-init

for which I don't have to care about… as they're not copies I would have to delete.

How to remove those blocks made up of just one line?

Target report:

./topic/org-batch/lisp/sword-mode.el
./home/.emacs.d/lisp/sword-mode.el
./home/.emacs.d/lisp/sword-mode-bak.el

./home/.emacs.d/elpa/auctex-12.1.1/COPYING
./home/.emacs.d/elpa/org-plus-contrib-20180813/COPYING

Can you please tell the exact output you needed? like (before and after versions) — winux, Aug 17 '18 at 07:49

score 1 · Accepted Answer · answered Aug 17 '18 at 12:00

1

awk might help. You can redefine what seperates lines(records) or fields in lines by resetting the variables record seperator (RS) and field seperator(FS) in the input and also the output record separator (ORS). If you set these to handle double newlines (\n\n) as record separation and single newline (\n) as field separation, every record containing more than one newline can be found by checking for number of fields bigger 1 (NF>1). These should be exactly your blocks with more than one line:

awk 'BEGIN {RS="\n\n";ORS="\n\n";FS="\n"}  {if(NF>1) print}'

Have a look here for exampleson awk variables.

PS: The last single line might be a problem, if it has a \n at the end.

answered Aug 17 '18 at 12:00

eraenderer

107
5

What would be the problem? To be printed while it's useless, or not to be printed while it's useful? – user3341592 Aug 17 '18 at 12:52
Does the rest work for you? If your last line `./topic/org-batch/bin/orgmk-init` contains a newline (\n) at the end it will not be filtered out, because it is interpreted as a record with two fields. You can avoid this by using `perl -pe 'chomp if eof'` as described [here](https://stackoverflow.com/questions/1654021/how-can-i-delete-a-newline-if-it-is-the-last-character-in-a-file) on your input before. – eraenderer Aug 17 '18 at 13:24
Having one block too much is no problem, much better that one that would not be printed while it should. Thanks! – user3341592 Aug 17 '18 at 20:01

Remove orphan lines from file

1 Answers1