1

Possible Duplicate:
merge two files by key if exists in the first file / bash script

I am trying to think of good way to combine these two selections of text (these are example of text that is in a similar format that I would be working on), based on the number prefacing the colon. This would be done in a bash environment, I've tried thinking of ways in which I could do it using cut along with other commands, but I have not been able to come up with anything that would work.

selection 1
1:829ede2828e9
2:893h8ew9nediucn
3:mdheuwe883ud8932

selection 2
1:stack
2:over
3:flow

the output would be something along the lines of

1:stack:829ede2828e9
2:over:893h8ew9nediucn
3:flow:mdheuwe883ud8932

so it would essentially be combining and matching the files based on the number prefacing the colon. This code would be used to process around 39,000 lines of text. I am stumped at this point in time so I would really appreciate any help I can get, thanks! also forgot to mention the numbers wont be consistent (1,3,4,5,9,11,22 for example), although both files/sets of text will have the same set of numbers.

Community
  • 1
  • 1
lacrosse1991
  • 2,972
  • 7
  • 38
  • 47
  • 1
    What about this: [join](http://stackoverflow.com/a/9635791/957560) ? – dearN Oct 26 '12 at 00:07
  • ^ Do that if they're sorted. If they're not, it's two steps to sort them first. `sort`. – FrankieTheKneeMan Oct 26 '12 at 00:10
  • I assume the numbers *aren't* sorted or contiguous? Are they unique? In any case, 39,000 lines isn't too many, so you can do the whole thing in memory. I suggest using awk to split the line into a number => text associative array, and combine the output. You could equally well use any scripting language with support for associative arrays (Python, Ruby, Php, Perl etc) for this. – John Carter Oct 26 '12 at 00:12
  • the numbers would be sorted, but they will not be consistent though, so there could be 1:sdds 3:sddsdsdsd 4:ddsds 7:cdds and so on, although both sets of data would have the same numbers – lacrosse1991 Oct 26 '12 at 00:15
  • @lacrosse1991 Probably be a good idea to add that case (where there might be missing numbers) to your question, so that people can take account of that when coming up with solutions – doubleDown Oct 26 '12 at 00:27
  • would you guys know how I could use join in this particular particular instance by any chance? – lacrosse1991 Oct 26 '12 at 00:34
  • Huh, `join` is smarter than I thought, I think it'll do what you want -see Olaf's answer. – John Carter Oct 26 '12 at 00:40

1 Answers1

3

You can use join like this:

join -t: selection2.txt selection1.txt
Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198