0

I saw and read many topics about my problem, but they didn't help me.

a close topic to my problem but it didn't help me: removing lines between two patterns (not inclusive) with sed

Question: I have a text file and I want to remove lines are between two patterns.

note1: between these patterns, i don't want to remove lines have a specific string as key-pattern.

pattern-1 can be a line number (line-2 >> this is static forever) or a word like as Speed (Also this word is static forever).

pattern-2 can be a line number (line-X >> this not static (it's dynamic)) or a word like as Station MAC (if your solution based-on word, fortunately This is static((Station MAC))).

if your solution is based on line number, i wrote a AWK command to fetch line number for using in second pattern:

awk '/Station MAC/ {print NR}'  david.txt

note2: according to note1, sed or other tools must not to remove line are contain my key-pattern.




Example: keeping lines have words Max or sms in themselves.

here, Max and sms are key-pattern.

Input: https://pastebin.com/cztQgm9m

BSSID, First time seen, Last time seen, channel, Speed, Power, # beacons, # IV, LAN IP, ID-length,
84:C9:B2:A6:0B:28, 18:51:36, 18:54:40,  7,  54, PA2,        2,   0.  0.  0.  0,   6, Maryam,
00:1E:E3:EB:2F:4E, 18:50:55, 18:54:36,  1,  54, W.  0.  0.  0,   8, Broadcom,
1C:BD:B9:79:91:C3, 18:50:17, 18:54:13, 11,  54, WP     0,   0.  0.  0.  0,   4, Home,
6C:AD:EF:1F:77:1F, 18:52:15, 18:54:17,  5,  54,    TP,SK,    6,        0,   0.  0.  0.  0,  12, MobinNet771F,
10:C6:1F:E9:90:6E, 18:50:36, 18:54:17,  6,  54,     7,        4,   0.  0.  0.  0,   9, ITIS_9162,
B0:48:7A:CF:BA:12, 18:52:09, 18:53:41,  7,  54,  TP,SK,     3,        0,   0.  0.  0.  0,   3, sms,
6C:19:8F:65:42:CB, 18:53:15, 18:53:15,  1,  54, , -62,        1,        0,   0.  0.  0.  0,  11, Rahmanzadeh,
.....
..skipped..
..skipped..
..skipped..
..
...
......
..skipped..
..skipped..
..skipped..
....
28:10:7B:93:BB:2E, 18:53:15, 18:53:15,  1,  -1,      0,        1,   0.  0.  0.  0,   0, ,
70:79:90:41:62:50, 18:50:17, 18:55:00,  4,  54, A, CP TP,SK, -19,      8,      9,   0.  0.  0.  0,  12, WiFi-Max-MTN,
EC:08:6B:6F:DF:C4, 18:52:52, 18:52:52,  6,  54, WP 2a, MP,SK, -66,        1,        0,   0.  0.  0.  0,   8, senator2,
6E:AD:EF:B4:CB:B6, 18:52:14, 18:52:14,  9,  54, A2, MP,PSK, -70,        1,        0,   0.  0.  0.  0,   6, Mohsen,
A8:F7:E0:06:1F:28, 18:52:44, 18:52:44,  9,  54, P,PSK, -70,        0,        0,   0.  0.  0.  0,  12, Borsa_Donne+,

Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed
04:C2:3E:FC:1E:BB, 18:53:00, 18:53:00,  -1,        1, 3C:1E:04:8F:12:83,
F0:79:60:9E:13:4E, 18:52:56, 18:52:56,  -1,        1, 10:C6:1F:E9:90:6E,
40:E2:30:D9:E8:4B, 18:50:53, 18:52:25, -60,        2, F4:F2:6D:DA:27:2F,
D0:65:CA:BD:93:EC, 18:52:12, 18:52:12,  -1,        1, B0:55:08:18:FC:0A,
B8:57:D8:46:86:D4, 18:51:58, 18:51:58, -74,        1, F8:D1:11:C5:0F:72,
28:5A:EB:87:CD:BA, 18:50:28, 18:51:20, -54,       12, 00:23:B1:7C:75:48,
E0:C7:67:88:19:0E, 18:51:08, 18:51:08,  -1,        7, 98:42:46:08:58:F4,

Desired output: https://pastebin.com/gSv74mcZ

BSSID, First time seen, Last time seen, channel, Speed, Power, # beacons, # IV, LAN IP, ID-length,
B0:48:7A:CF:BA:12, 18:52:09, 18:53:41,  7,  54,  TP,SK,     3,        0,   0.  0.  0.  0,   3, sms,
70:79:90:41:62:50, 18:50:17, 18:55:00,  4,  54, A, CP TP,SK, -19,      8,      9,   0.  0.  0.  0,  12, WiFi-Max-MTN,

Station MAC, First time seen, Last time seen, Power, # packets, BSSID, Probed
04:C2:3E:FC:1E:BB, 18:53:00, 18:53:00,  -1,        1, 3C:1E:04:8F:12:83,
F0:79:60:9E:13:4E, 18:52:56, 18:52:56,  -1,        1, 10:C6:1F:E9:90:6E,
40:E2:30:D9:E8:4B, 18:50:53, 18:52:25, -60,        2, F4:F2:6D:DA:27:2F,
D0:65:CA:BD:93:EC, 18:52:12, 18:52:12,  -1,        1, B0:55:08:18:FC:0A,
B8:57:D8:46:86:D4, 18:51:58, 18:51:58, -74,        1, F8:D1:11:C5:0F:72,
28:5A:EB:87:CD:BA, 18:50:28, 18:51:20, -54,       12, 00:23:B1:7C:75:48,
E0:C7:67:88:19:0E, 18:51:08, 18:51:08,  -1,        7, 98:42:46:08:58:F4,
ali reza
  • 141
  • 3
  • 13

2 Answers2

1

You can try this sed

Not perfect but with busybox on win7 !

sed '/Speed/,/^$/{!d;/sms\|Max\|Speed\|^$/!d}' infile
ctac_
  • 2,413
  • 2
  • 7
  • 17
1

As Python allows for explicit processing in loops, it is easy to build a function to filter a file object. It may be suboptimal but easy to write, read and maintain.

It could be:

def filter(fdin, fdout, pat1, pat2, *keys):
    """
Remove lines between a line containing pat1 and a line containing pat2,
but also keep lines that would contain any string from keys
fdin:  input file object
fdout: output file object
pat1:  gives the beginning of removed lines (kept in output)
pat2:  gives the end of removed lines (also kept in output)
keys:  a number of key-patterns - if a line contains one, it is not removed
"""
    def keypresent(line):    # internal function to test for key patterns
        for k in keys:
            if line.find(k) != -1:
                return True
        return False
    keep = True              # will be False after pat1 and before pat2
    # finished = False         # will be True after pat2
    for line in fdin:
        if keep or keypresent(line):
            fdout.write(line)         
            # if finished: continue
        if line.find(pat1) != -1:
            keep = False
        elif not keep and (line.find(pat2) != -1):
            fdout.write(line)
            keep = True
            # finished = True

If there is another line containing pat1 after pat2, the function will start removing lines again. If it is not desirable, just uncomment the 3 lines about finished.

It can be used that way:

with open(inputfilename) as fdin, open(outputfilename, 'w') as fdout:
    filter(fdin, fdout, "Speed", "Station MAC", "Max", "sms")

Simply it does not keep the empty line before "Station MAC" but it would be trivial to fix...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • dear, Tnx you for help. I could write a python script to solve the problem/question. but i need a tool was based-on C. because the size of file is huge and it has many lines. However tnx a lot for you help. I share, You share = We learn. Tnx again. [AT]ctac_ solution benchmark is very great. – ali reza Mar 09 '18 at 04:26