0

lets say there is a file that contains the following:

Hello
=========
last: paul 
last: susy 
last: king  
last: jorge 
last: henry 
last: ida 

Goodbye
=========
first: paul
first: susy
first: charles
first: lincoln
first: ida

Example output

last: paul
first: paul
last: susy
first: susy
last: ida
first: ida

How would it be possible to write a script in bash or python to extract all matching names and push to a new file regardless of the last and first keywords in the beginning?

secure_20
  • 19
  • 4

3 Answers3

0

suppose your file is called test.txt,

first_names = set()
last_names = set()

for line in open('test.txt', 'r'):
    if line.startswith( 'last:' ):
        last_names.add( line.split()[1] )
    if line.startswith( 'first:' ):
        first_names.add( line.split()[1] )

output_names = []
output_names = [name for name in first_names if name in last_names]

with open('new.txt', 'w' ) as f:
    for name in output_names:
        f.write('last: '+name+'\n')
        f.write('first: '+name+'\n')

To explain a little, the first part creates two empty sets for first_names and last_names. You can use lists for these, but checking for membership (which is what happens later with if name in last_names) faster for a set. Its O(1) for a set and O(n) for a list where n is the size of the list.

A nice feature of Python is that you can naturally iterate over the lines of a file object. The line.split()[1] part splits the lines using white space and takes the second element (Python indexes from 0).

While the sets are faster for membership checking, they are unordered so wont preserve the order of names in the file. To construct output_names I use what's called a list comprehension. The last part writes the results to file.

Gabriel
  • 10,524
  • 1
  • 23
  • 28
0

Say the file is names.txt

In Python:

import re
import os

f = open('names.txt')
lines = f.readlines()
last_names = first_names = result = []
for line in lines:
    if line.startswith('last:'):
       last_names.append(line[6:])
    elif line.startswith('first:'):
        first_names.append(line[7:])
result = [name for name in last_names if name in first_names]
#do whatever you want to with result
aa333
  • 2,556
  • 16
  • 23
0
awk 'FNR==NR {a[FNR""]=$0; next}{print a[FNR""]"\n"$0}' file1 file2

see Using AWK to Process Input from Multiple Files

Community
  • 1
  • 1