Generate a new file based on a condition + column matching of two files

Question

First of all, apologies if I didn't come across a similar answer before posting.

I'm trying to create a a third file based on several conditions.

I have two input files

file1 (tab separated):-

X_ID1 y_id11 num1
X_ID2 y_id31 num2  
X_ID3 y_id34 num3 
X_ID4 y_id23 num4
X_ID5 y_id2  num5 
...  
...

file 2:-

BIOTIC AND ABIOTIC STRESS
x_id2
REGULATION OF TRANSCRIPTION
x_id1
x_id4
HORMONES
x_id5
REGULATION
x_id6
x_id13
...
...

****Please note that column 1 of file 1 is UPPERCASE and data in file2 is lowercase

What I want is so have an output file (file3) as following:-

BIOTIC AND ABIOTIC STRESS
y_id31
REGULATION OF TRANSCRIPTION
y_id11
y_id23
HORMONES
y_id2
...
...

Basically if I think of a "pseudo code" it goes as following:-

while read $line from file2; do
 if [[line1 != x_*]]; then
    print $line
 else
    match $line (case insensitively) with column 1 of file1 and print respective column2 of file1
 fi
done

Would you please be able to help me solve this problem?

Thanks a lot in advance!

James Brown · Accepted Answer · 2017-07-27T08:10:33.350

4

In awk:

$ awk 'NR==FNR{a[tolower($1)]=$2;next}{print ($1 in a?a[$1]:$0)}' file1 file2
BIOTIC AND ABIOTIC STRESS
y_id31
REGULATION OF TRANSCRIPTION
y_id11
y_id23
HORMONES
y_id2
REGULATION
x_id6
x_id13

Explained:

$ awk '
NR==FNR {                    # first file
    a[tolower($1)]=$2        # hash to a, key is lowercase $1 data is $2
    next                     # skip tp next record
}
{                            # second file
    print ($1 in a?a[$1]:$0) # if $1 exists in hash a, print it, else print current
}' file1 file2               # mind the order

On @Sundeep's suggestion, this is a good intro to two file processing in awk.

edited Jul 27 '17 at 08:10

answered Jul 27 '17 at 08:07

James Brown

36,089
7
43
59

1

Thank you so much for this and the explanation. It is great when you can understand what's in the explanation without having to blindly run the code. And yes, it works! :) – Shani A. Jul 27 '17 at 08:19

Samy · Answer 2 · 2017-07-31T10:40:34.390

1

OLD_IFS="${IFS}"
IFS=$'\n'
for line in `cat file2`
do
        if [[ -z `echo "${line}" | grep x_*`  ]]
        then
                echo "${line}"
        else
                grep -i "${line}" file1 | awk -F ' ' '{print $2}'
        fi
done
IFS="${OLD_IFS}"

edited Jul 31 '17 at 10:40

answered Jul 27 '17 at 08:09

Samy

629
8
22

full of bad practices... see http://mywiki.wooledge.org/DontReadLinesWithFor , http://mywiki.wooledge.org/BashFAQ/082 , https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Sundeep Jul 27 '17 at 08:14
Thank you for this. Yes it does work. However I prefer the awk answer as it is quite straightforward. But according to my tags, yours work perfectly. :) – Shani A. Jul 27 '17 at 08:20

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

can be done via one while loop:-

while IFS= read -r line;
do
   var=`echo $line | tr '[a-z]' '[A-Z]'`
   col2=`grep "$var" file1|cut -d" " -f2`
   if [[ -z "$col2" ]] ; then
        echo "$line" >> file3
    else
        echo "$col2"  >> file3
   fi

done < file2

Explanation:-

var=echo $line | tr '[a-z]' '[A-Z]' - converts small case to UPPER case.

col2=grep "$var" file1|cut -d" " -f2 - matches for pattern from file1. If no match i.e variable col2 is empty, write line into a file file3 else write col2 into a file.

Generate a new file based on a condition + column matching of two files

3 Answers3