Merging two files with a key -> values structure

Question

I am looking for ideas, not a complete solution for the next problem in shell(linux). What is the best solution? (awk, while-loop, sed ....)

I have two files with the same line structure: key-value-value. I want to merge these two files. If the value does not exist, the script insert a new line. If it exists the script updates the values (by adding them up).

Example:
File 1:

john-15-40
doo-10-91
mary-14-19
foo-11-0

File 2:

foo-110-10
john-22-11
ghost-1000-1000

Result:
foo-121-10
john-37-51
ghost-1000-1000
doo-10-91
mary-14-19

How can I do this?

score 4 · Accepted Answer · answered Apr 28 '12 at 11:29

4

Simple with awk

awk '
  BEGIN {FS = OFS = "-"}
  {v1[$1] += $2; v2[$1] += $3}
  END {for (key in v1) {print key, v1[key], v2[key]}}
' F1 F2

answered Apr 28 '12 at 11:29

glenn jackman

238,783
38
220
352

score 1 · Answer 2 · answered Apr 28 '12 at 10:26

1

You need a language with accociative arrays. Your task is very easy for any scripting language but perl and awk are especially good for processing text files a line by line.

Pseudocode:

read line from file1, file2
split line to key and values
if there are no key in hash
     add key and values
else
     add values and print key/values

answered Apr 28 '12 at 10:26

yazu

4,462
1
20
14

this is a good solution thanks, but not the smartest in shell(linux), – flatronka Apr 28 '12 at 10:33
@hunor actually, Bash 4 does have native associative arrays; if you don't need it to work for older versions of the language, this can be done in native shell. – Charles Duffy Apr 28 '12 at 11:49
@Andrey no, associative arrays aren't needed for this -- if you sort your input stream before processing. – Charles Duffy Apr 28 '12 at 12:15

score 1 · Answer 3 · answered Apr 28 '12 at 10:40

1

I know you didn't ask for it in PHP, but it might help. There's probably something similar in another language if you'd prefer:

<?PHP

$file_handle = fopen("file1", "r");

while (!feof($file_handle) ) {
$line_of_text = fgets($file_handle);
list($name,$value1,$value2) = explode('-', $line_of_text);
$file1[$name]=array($value1,$value2);
}
fclose($file_handle);
// repeate for file2
//then use the 2 arrays, $file1[] and $file2[] to rewrite the file as 'file3' or whatever. 
//Checking for duplicates and doing the math.
?>

answered Apr 28 '12 at 10:40

TecBrat

3,643
3
28
45

thanks for the solution, i love php, it's similar to Andrey Yazu pseudocode – flatronka Apr 28 '12 at 10:52
I thought the same thing. He posted it while I was writing. – TecBrat Apr 28 '12 at 11:06
definitely thanks, but i am looking for a smart, shell(linux) based solution – flatronka Apr 28 '12 at 11:13

Charles Duffy · Answer 4 · 2012-04-28T12:16:40.493

This can be done natively in Bash 4:

#!/bin/bash
declare -A vals_one vals_two
while IFS=- read key val1 val2; do
  if [[ ${vals_one["$key"]} ]] ; then
    vals_one["$key"]=$(( ${vals_one["$key"]} + val1 ))
    vals_two["$key"]=$(( ${vals_two["$key"]} + val2 ))
  else
    vals_one["$key"]=$val1
    vals_two["$key"]=$val2
  fi
done < <(cat input1.txt input2.txt)
for key in "${!vals_one[@]}"; do
  printf '%s-%s-%s\n' "$key" "${vals_one[$key]}" "${vals_two[$key]}"
done

Note that this approach is somewhat memory-inefficient; a more memory-efficient approach would sort the files before merging them (GNU sort is able to generate temporary files if the contents it's sorting can't fit in memory, and is thus more capable at this than any reasonable script we would write), and thus only need to store two lines at a time in memory:

#!/bin/bash

function merge_inputs {
    IFS=- read key val1 val2
    while IFS=- read new_key new_val1 new_val2; do
      if [[ $key = "$new_key" ]] ; then
        val1=$(( val1 + new_val1 ))
        val2=$(( val2 + new_val2 ))
      else
        printf '%s-%s-%s\n' "$key" "$val1" "$val2"
        key=$new_key
        val1=$new_val1
        val2=$new_val2
      fi
    done
    printf '%s-%s-%s\n' "$key" "$val1" "$val2"
}
sort input1.txt input2.txt | merge_inputs

Also, this latter form does not require associative arrays, and will work with older versions of bash (or, with some adaptation, other shells).

thanks for your work :d, i like it, nice – flatronka Apr 28 '12 at 12:26 — flatronka, Apr 28 '12 at 12:26

kev · Answer 5 · 2012-04-28T12:10:40.387

1

I like glenn's short fat solution. And there's a tall thin solution.

If you have two files: 1.txt and 2.txt.

sort {1,2}.txt |
awk -F- -vOFS=- '
NR==1{
    x=$1
}
x==$1{
    y+=$2
    z+=$3
    next
}
{
    print x,y,z;
    x=$1
    y=$2
    z=$3
}
END{
    print
}'

edited Apr 28 '12 at 12:10

answered Apr 28 '12 at 11:58

kev

155,172
47
273
272

Merging two files with a key -> values structure

5 Answers5

Linked