0

Suppose i have this file

var1 300
var1 400
var3 600
var1 200

Now how can i compare the $1(from line 1) == $1 (from line 2)

basically i just want to add the columns if the name is equal

Only with awk

The output should br

var1 900
var3 600
user1755071
  • 3,559
  • 3
  • 15
  • 9

2 Answers2

1

Using awk:

$ awk '{a[$1]+=$2;}END{for (i in a)print i, a[i];}' file
var1 700
var3 600
Guru
  • 16,456
  • 2
  • 33
  • 46
  • i want to know that if i have the file of 5GB in size, can i use above method – user1755071 Jan 16 '13 at 02:57
  • 1
    @user55711, awk won't have trouble processing a huge file because it basically reads a line at a time. Since this method creates an associative array with column 1 as keys, the question is how many unique keys are there, and what is their size? – glenn jackman Jan 16 '13 at 04:52
  • @glennjackman may be 1 million , will it work with that amount – user1755071 Jan 16 '13 at 05:13
  • 1
    If your computer has only a tiny amount of memory, it might be slow. Try it and let us know. – glenn jackman Jan 16 '13 at 12:04
1

Use this over @Guru's solution if your file is large and/or you care about preserving the input order and your input is sorted on the first field:

$ awk '(NR>1) && ($1!=p){print p, s; s=0} {p=$1; s+=$2} END{print p, s}' file
var1 700
var3 600
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • thanks buddy , but i am confused with two things. 1.) Does awk process the file line by line and in order. Also what if i have `var1 = 200` is also after the var 2 , then what will ahppen – user1755071 Jan 16 '13 at 03:24
  • 1) yes 2) then you'll get a 2nd instance of var1 with it's own count that started after var2 in your output. The logic is to compare the current line's $1 with the previous lines $1 and if they're different then print the sum for that previous $1 and reset the sum to zero. – Ed Morton Jan 16 '13 at 04:43
  • 1
    @user55711, at least this method should filter out the bulk of your huge file. Then you can sort it and pass it through Ed's program for a second pass. – glenn jackman Jan 16 '13 at 04:53
  • 1
    @user55711, don't. Use `sort`, that's what it's for. – glenn jackman Jan 16 '13 at 12:04