0

Using only bash, I am trying to extract all the new content between a file A & a file B in the most efficient way possible.

I can't seem to get it right using diff or comm. What would you recommend instead?

Example:

file A:

1
2
3

file B:

2
3
4

expected output:

4
Dark
  • 505
  • 4
  • 12
  • 1
    `comm` is the tool for this job, especially if your input is presorted. See [BashFAQ #36](https://mywiki.wooledge.org/BashFAQ/036) -- `diff` does a bunch of work (costing both time and memory) to figure out the shortest possible edit path, but when all you need is to know which lines are new you don't care about that at all. – Charles Duffy Feb 19 '22 at 14:58
  • (if your files are guaranteed to be small and _not_ sorted, awk can make sense since it lets you skip the sort step, but because the conventionally correct approach needs to store one of the two files in memory, it's not as universally correct of a tool) – Charles Duffy Feb 19 '22 at 15:03
  • When you say "can't seem to get it right using diff or comm" -- _show us how_ you tried to apply the linked answers using comm, and show us exactly what went wrong. If we have to guess what went wrong, how are we supposed to avoid whatever miscommunication or oversight the existing questions' answers have? – Charles Duffy Feb 19 '22 at 15:04
  • (if your content isn't line-oriented, then I suggest not showing sample input/output pairs for which line-oriented processing would result in your stated-correct answer). – Charles Duffy Feb 19 '22 at 15:06
  • The content is line oriented. But as provided in the example I want to get only the new content. A comm -13 a.txt b.txt will return me 3 4, which do not fullfill my need to return 4. – Dark Feb 19 '22 at 15:09
  • 1
    It does not return `3 4`, it only returns `4`. See https://ideone.com/GRPTED. If you have different behavior, show us how to produce that behavior; we need a [mre] so we can see it ourselves, or how can we test proposed fixes? – Charles Duffy Feb 19 '22 at 15:10
  • (if you don't like ideone.com, another online sandbox you can use is repl.it; if you can link to a sandbox that does the wrong thing, that guarantees someone else can see the problem with their own eyes, letting us verify that it isn't just a problem with your input file having hidden characters or such and thus making the lines not really match). – Charles Duffy Feb 19 '22 at 15:11
  • 1
    (one thing I wonder about here is if your input files are in DOS format -- that would make all but the last line have a hidden carriage return at the end, making the file A contain only `3` while file B contains `3$'\r'`, so the lines don't actually match each other) – Charles Duffy Feb 19 '22 at 15:15
  • Good catch, it was due to the format! Thanks a lot! – Dark Feb 19 '22 at 15:27

0 Answers0