Extract each line from a text file and compare it with all the lines in the file to avoid duplication in shell script

Question

The content of text file text.txt

DS/Apple
DS/Banana
DS/Strawberry
DS/Apple
DS/Orange

The code that I wrote :

for i in $(cat text.txt)
do
    count = 0
    for j in $(cat text.txt)
    do
    if [ i == j ]; then
    count = count + 1
    fi
    done
    if [ count <  2 ]; then
    $i >> Final.txt
done

The error I am getting :

$ Trial.sh
./Trial.sh: line 12: syntax error near unexpected token `done'
./Trial.sh: line 12: `done'

I want the output without any duplicate lines in a new text file Final.txt

Where am I going wrong?

Yes, I missed fi. When I correct that, I am getting an error again. — AmrutaMV, Dec 19 '16 at 13:17
./Trial.sh: line 3: count: command not found ./Trial.sh: line 10: 2: No such file or directory ./Trial.sh: line 3: count: command not found ./Trial.sh: line 10: 2: No such file or directory ./Trial.sh: line 3: count: command not found ./Trial.sh: line 10: 2: No such file or directory ./Trial.sh: line 3: count: command not found ./Trial.sh: line 10: 2: No such file or directory ./Trial.sh: line 3: count: command not found ./Trial.sh: line 10: 2: No such file or directory — AmrutaMV, Dec 19 '16 at 13:17
Now you have many other issues... `count = 0` tries to run a command named `count`. (http://stackoverflow.com/questions/2268104/bash-script-variable-declaration-command-not-found/2268117#2268117) `$i >> Final.txt` tries to run a command named whatever is in the variable `i`. `count = count + 1` again tries to run a command named `count`. etc. — William Pursell, Dec 19 '16 at 13:19
I want to use count variable to track the number of times the line is matching. If the line matches more than once then I want to discard the duplicates and form a new text file with out duplicates. — AmrutaMV, Dec 19 '16 at 13:20
I am new to shell scripting and am not sure about the usage of variables. — AmrutaMV, Dec 19 '16 at 13:22

score 1 · Answer 1 · answered Dec 19 '16 at 13:38

1

awk '!seen[$0]++' text.txt > Result.txt

This solved my problem. It removed the duplicates.

answered Dec 19 '16 at 13:38

AmrutaMV

21
4

score 0 · Answer 2 · answered Dec 19 '16 at 13:16

0

If you want to remove all duplicates, you probably don't want to scan the file multiple times. You can easily do it in one pass:

awk '!a[$0]++' text.txt

answered Dec 19 '16 at 13:16

William Pursell

204,365
48
270
300

This is just an example file. It need not be the same Duplicate to use a regular expression. Moreover, even this command gives the output as : – AmrutaMV Dec 19 '16 at 13:30
$ Trial.sh DS/Apple DS/Banana DS/Strawberry DS/Orange DS/Apple DS/Banana DS/Strawberry DS/Apple DS/Orange – AmrutaMV Dec 19 '16 at 13:31
@AmrutaMV I'm a bit confused by your comments, given that you indicate that this "soved my problem". – William Pursell Dec 20 '16 at 00:28
What was confusing? – AmrutaMV Jan 10 '17 at 06:41
@AmrutaMV Are you saying that running this command (which is identical to the solution which you say "solved my problem") generates output in which the line "DS/Apple" appears twice? That seems to be what your comments indicate, but that doesn't make any sense. Hence, the confusion. – William Pursell Jan 10 '17 at 06:51

Extract each line from a text file and compare it with all the lines in the file to avoid duplication in shell script

2 Answers2