0

I have 2 big .csv file. I want to extract content of file 2 that there in file 1.

for example:

file1:

A1BG    1
NAT1    9
NAT2    10
SERPINA3    12
AAMP    14
AANAT   15
AARS1   16

file 2:

1 10422
1 10549
1 2232
1 23198
1 23352
1 284403
1 368
1 51035
1 80854
1 9923
2 10053
2 10376
2 10724
2 2026
2 2193
2 22976
2 23154
2 24138
2 2639
2 284207
2 285203
2 3337
2 3437
2 348
2 348
2 348
2 351
4 7689

output:

1 10422
1 10549
1 2232
1 23198
1 23352
1 284403
1 368
1 51035
1 80854
1 9923

it is my code:

awk 'NR==FNR{FS=" ";a[$2];next}{FS=" ";if ($1 in a) print $0}' <file1.csv <file2.csv >output.csv

but I have no output.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
real name
  • 9
  • 2

2 Answers2

1

I believe you are simply looking for this solution in awk.

awk 'FNR==NR{a[$2];next} ($1 in a)' Input_file1  Input_file2

Explanation: Adding detailed explanation for above.

awk '                       ##Starting awk program from here.
FNR==NR{                    ##Checking condition FNR==NR which will be TRUE when Input_file1 is being read.
  a[$2]                     ##Creating array a with index of 2nd field of current line.
  next                      ##next will skip all further statements from here.
}
($1 in a)                   ##Checking condition if 1st field is present in array a then print that line from Input_file2
' Input_file1  Input_file2  ##Mentioning Input_file names here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • yes. I know this solution. I checked many solutions other than it. but any of them didn't work. I tried them on two small files and get the true result but on my big data didn't work. – real name Oct 13 '20 at 20:17
  • 1
    "didn't work" is completely meaningless. Read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example), edit your question with the new example, and you should be able to re-open the question. – glenn jackman Oct 13 '20 at 22:10
1

Let's look at your code:

awk 'NR==FNR{FS=" ";a[$2];next}{FS=" ";if ($1 in a) print $0}' <file1.csv <file2.csv >output.csv

You are redirecting input from 2 files. The shell can only have a single source of data for each file descriptor: the shell processes redirections from left to right as they are seen on the command line. Try this:

awk 'NR==FNR' <file1.csv <file2.csv

and you'll probably be surprised at what awk considers the "first file".

awk is fully capable of reading files, you don't need the shell to do that:

awk 'NR==FNR' file1.csv file2.csv
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • I tried many other tasks like this you mentioned. I can get the true result on two small files. but for my own big files they didn't work. our files are in .csv format and comma separator and I set FS to comma. I changed them to txt and I get no result. – real name Oct 13 '20 at 20:21