0

I am looking for ideas, not a complete solution for the next problem in shell(linux). What is the best solution? (awk, while-loop, sed ....)

I have two files with the same line structure: key-value-value. I want to merge these two files. If the value does not exist, the script insert a new line. If it exists the script updates the values (by adding them up).

Example:
File 1:

john-15-40
doo-10-91
mary-14-19
foo-11-0

File 2:

foo-110-10
john-22-11
ghost-1000-1000

Result:
foo-121-10
john-37-51
ghost-1000-1000
doo-10-91
mary-14-19

How can I do this?

flatronka
  • 1,061
  • 25
  • 51

5 Answers5

4

Simple with awk

awk '
  BEGIN {FS = OFS = "-"}
  {v1[$1] += $2; v2[$1] += $3}
  END {for (key in v1) {print key, v1[key], v2[key]}}
' F1 F2
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

You need a language with accociative arrays. Your task is very easy for any scripting language but perl and awk are especially good for processing text files a line by line.

Pseudocode:

read line from file1, file2
split line to key and values
if there are no key in hash
     add key and values
else
     add values and print key/values
yazu
  • 4,462
  • 1
  • 20
  • 14
1

I know you didn't ask for it in PHP, but it might help. There's probably something similar in another language if you'd prefer:

<?PHP

$file_handle = fopen("file1", "r");

while (!feof($file_handle) ) {
$line_of_text = fgets($file_handle);
list($name,$value1,$value2) = explode('-', $line_of_text);
$file1[$name]=array($value1,$value2);
}
fclose($file_handle);
// repeate for file2
//then use the 2 arrays, $file1[] and $file2[] to rewrite the file as 'file3' or whatever. 
//Checking for duplicates and doing the math.
?>
TecBrat
  • 3,643
  • 3
  • 28
  • 45
1

This can be done natively in Bash 4:

#!/bin/bash
declare -A vals_one vals_two
while IFS=- read key val1 val2; do
  if [[ ${vals_one["$key"]} ]] ; then
    vals_one["$key"]=$(( ${vals_one["$key"]} + val1 ))
    vals_two["$key"]=$(( ${vals_two["$key"]} + val2 ))
  else
    vals_one["$key"]=$val1
    vals_two["$key"]=$val2
  fi
done < <(cat input1.txt input2.txt)
for key in "${!vals_one[@]}"; do
  printf '%s-%s-%s\n' "$key" "${vals_one[$key]}" "${vals_two[$key]}"
done

Note that this approach is somewhat memory-inefficient; a more memory-efficient approach would sort the files before merging them (GNU sort is able to generate temporary files if the contents it's sorting can't fit in memory, and is thus more capable at this than any reasonable script we would write), and thus only need to store two lines at a time in memory:

#!/bin/bash

function merge_inputs {
    IFS=- read key val1 val2
    while IFS=- read new_key new_val1 new_val2; do
      if [[ $key = "$new_key" ]] ; then
        val1=$(( val1 + new_val1 ))
        val2=$(( val2 + new_val2 ))
      else
        printf '%s-%s-%s\n' "$key" "$val1" "$val2"
        key=$new_key
        val1=$new_val1
        val2=$new_val2
      fi
    done
    printf '%s-%s-%s\n' "$key" "$val1" "$val2"
}
sort input1.txt input2.txt | merge_inputs

Also, this latter form does not require associative arrays, and will work with older versions of bash (or, with some adaptation, other shells).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

I like glenn's short fat solution. And there's a tall thin solution.

If you have two files: 1.txt and 2.txt.

sort {1,2}.txt |
awk -F- -vOFS=- '
NR==1{
    x=$1
}
x==$1{
    y+=$2
    z+=$3
    next
}
{
    print x,y,z;
    x=$1
    y=$2
    z=$3
}
END{
    print
}'
kev
  • 155,172
  • 47
  • 273
  • 272