1

I have an input file called input.txt like this:

powerOf|creating new file|failure
creatEd|new file creating|failure
powerAp|powerof server|failureof file

I extract the text up to just before the fist capital letter in the first field and store those snippets in output.txt:

power
creat

I used the sed command to separate out the values and it's working fine.

From the output file (output.txt), I need to grep from the first field, and output should be like below:

Power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure

I have tried a few ways but I'm not getting the expected output.

I tried the following but I'm getting duplicate entries:

cat input.txt | cut -d '|' f1 >> input1.txt
cat input1.txt | s/\([a-z]\)\([A-Z]\)/\1 \2/g >> output.txt
while read -r line;do
  echo $ line
  cat input.txt |cut -d ‘|’ f1|grep $line >> output1. txt
done< "output.txt"

I have 20000 lines in the input file. I don’t know why I am getting duplicates the output. What am I doing wrong?

tripleee
  • 175,061
  • 34
  • 275
  • 318
Mohan
  • 13
  • 3
  • 2
    I closed [the old question](/questions/54456808/grep-word-from-a-file-and-kept-the-output-as-below) as a duplicate of this one. – tripleee Jan 31 '19 at 09:46

2 Answers2

2

Bash solution:

#!/bin/bash
keys=()
declare -A map
while read line; do
    key=$(echo ${line} | cut -d \| -f1 | sed -e 's/[[:upper:]].*$//')
    if [[ -z "${map[$key]}" ]]; then
        keys+=(${key})
        map[$key]="${line}"
    else
        map[$key]+=",${line}"
    fi
done

for key in ${keys[*]}; do
    echo "${key}"
    echo "${key}:${map[$key]}"
done

exit 0

Maybe a Perl solution is acceptable for OP too:

#!/usr/bin/perl
use strict;
use warnings;

my @keys;
my %map;
while (<>) {
    chomp;
    my($key) = /^([[:lower:]]+)/;
    if (not exists $map{$key}) {
        push(@keys, $key);
        $map{$key} = [];
    }
    push(@{ $map{$key} }, $_);
}

foreach my $key (@keys) {
    print "$key\n";
    print "$key:", join(",", @{ $map{$key} }), "\n";
}


exit 0;

Test with your given input:

$ perl dummy.pl <dummy.txt
power
power:powerOf|creating new file|failure,powerAp|powerof server|failureof file
creat
creat:creatEd|new file creating|failure

UPDATE after OP has re-stated the original problem. Solution for the first loop that only includes the 2nd column of the input instead of the whole line:

    message=$(echo ${line} | cut -d \| -f2)
    if [[ -z "${map[$key]}" ]]; then
        keys+=(${key})
        map[$key]="${message}"
    else
        map[$key]+=",${message}"
    fi

Test with your given input:

$ perl dummy.pl <dummy.txt
power
power:creating new file,powerof server
creat
creat:new file creating
Stefan Becker
  • 5,695
  • 9
  • 20
  • 30
  • Hmm, seems like my bash skills aren't as rusty as I thought. Bash solution added... – Stefan Becker Jan 31 '19 at 10:24
  • @Mohan my solutions do not create `output.txt`, because it is unnecessary to do so to create the final output. – Stefan Becker Jan 31 '19 at 10:27
  • Thanks you for the script Berker, is the correct way to keep input/output files..input should be after completion of while and output file should in after completion of for loop declare -A map --> what this will do – Mohan Jan 31 '19 at 12:43
  • @Mohan are you saying that you need the intermediate `output.txt` with the list of keys? Why? Not that it is difficult to dump the key list to another file in the provided solution. Please consult the section `Arrays` in the `bash` man page about "associative arrays" (AKA hash, maps or directories in other programming languages) and `define -A`. – Stefan Becker Jan 31 '19 at 12:53
  • I don't want output of keys(output.txt,) but i need output of output1.txt file, as i am expecting output like below power power:powerOf|creating new file|failure,powerAp|powerof server|failureof file creat creat:creatEd|new file creating|failure still i didn't test the bash script, come back to you if any error, thanks for your help!! – Mohan Jan 31 '19 at 13:08
  • The bash script prints the exact same output as the test output I provided for the Perl script, that's why I didn't repeat it. The output looks exactly like you specified in your question (unless I'm blind about some minor detail). FYI: command line would be `bash script.sh output1.txt` – Stefan Becker Jan 31 '19 at 13:12
  • Thanks Becker, the script working fine as expected adding one question :- is there a any way to get only second field on output1.txt file (mentioned in below) cat output1.txt power power:creating new file,powerof server creat creat:new file creating – Mohan Jan 31 '19 at 22:32
  • Yes, it is possible. Simply change the code to extract the second field from the input line and add that value to the map instead of the complete input line. – Stefan Becker Feb 01 '19 at 07:11
  • Thanks, could you check the below will work or not. Lock=$(echo ${line} | cut -d \| -f2 and this should be add to below map[$key]="${Lock}” else map[$key]+=",${Lock} – Mohan Feb 01 '19 at 10:12
  • Looks OK. Please remember to upvote correct answers and accept the one you like best. – Stefan Becker Feb 01 '19 at 10:25
  • Hi Becker, i was new to stackoverflow, that's the reason i didn't clicked on upvote now i have done it I have tried with below script, and getting only last output – Mohan Feb 04 '19 at 13:26
  • #!/bin/bash keys=() declare -A map while read line; do key=$(echo ${line} | cut -d \| -f1 | sed -e 's/[[:upper:]].*$//') lock=$(echo ${line} | cut -d \| -f2) if [[ -z "${map[$key]}" ]] && [[ -z "${map[$lock]}" ]]; then lock+=(${lock}) keys+=(${key}) map[$key]="${key}" map[$lock]="${lock}" else map[$key]+=",${key}" map[$lock]+=",${lock}" fi done for key in ${keys[*],${lock[*]}}; do echo "${key}" echo "${key}:${map[$lock]}" done exit 0 – Mohan Feb 04 '19 at 13:34
  • Sorry but the code you added to the comment makes no sense. I've updated my answer with an alternative that uses the 2nd field instead of the whole line. – Stefan Becker Feb 04 '19 at 13:45
  • thanks! i tried below one. but the lock value not updating properly, it's coming as nulllock=() set -vx declare -A map while read line; do key=$(echo ${line} | cut -d \| -f1 | sed -e 's/[[:upper:]].*$//') lock=$(echo ${line} | cut -d \| -f2) if [[ -z "${map[$lock]}" ]]; then lock+=(${lock}) map[$lock]="${lock}" else map[$lock]+=",${lock}" fi done for lock in ${lock[*]}; do echo "${key}" echo "${key}:${map[$lock]}" done exit 0, please help me on it – Mohan Feb 05 '19 at 10:25
  • It seems that you are making changes to the code without understanding what you are changing. I would recommend taking a shell tutorial or read a book about shell programming, e.g. [Learning the bash Shell, 3rd Edition](http://shop.oreilly.com/product/9780596009656.do). Please do have the courtesy to accept one of the answers, because your original problem was solved. – Stefan Becker Feb 05 '19 at 10:31
2

Factoring out the useless uses of cat and other antipatterns, you are basically doing

# XXX not a solution, just a refactoring of your code
sed 's/\([a-z]\)\([A-Z]\).*/\1/' input.txt | grep -f - input.txt

which extracts the lines just fine, but does nothing to join them. If you want to merge lines with the same prefix values, a simple Awk script will probably do what you need.

awk '{ key=$1; sub(/[A-Z].*/, "", key)
      b[key] = (key in b ? b[key] "," : key ":" ) $0 }
    END { for(k in b) print b[k] }' input.txt

We extract the prefix into key. If it's a key we have seen before (in which case it exists in the associative array b already), append the previous value and a comma, else initialize the array value to the key itself and a colon before the current line. When we are done, loop through the accumulated key and print the value we have stored for each.

If the lines are long, 20,000 lines might not fit into memory at once, but if your example is representative, should be an unremarkable task on even modest hardware.

tripleee
  • 175,061
  • 34
  • 275
  • 318