2

I think this question can be answered by most python users and is quite general in terms of knowledge but for me I can't figure it out.

This is the code:

            if ssid in net and p.addr2 not in clients:
                count +=1
                get_oui(p.addr2)
                net.append(ssid)
                checkmac(p.addr2)
                mps+=1   
                print str(count)+'>',p.addr2+' ('+G+macf+W+') <--Probing--> '+O+ssid+W+' < '+Y+'MPS'+W
                if args.log:
                    wr_log(p.addr2,ssid,macf)

            elif ssid not in net and p.addr2 in clients:
                count +=1
                net.append(ssid)
                get_oui(p.addr2)
                clients.append(p.addr2)
                mpm+=1   
                print str(count)+'>',p.addr2+' ('+G+macf+W+') <--Probing--> '+O+ssid+W+' < '+Y+'MPM'+W
                if args.log:
                    wr_log(p.addr2,ssid,macf)
            elif ssid not in net and p.addr2 not in clients:                    
                count +=1                                     
                        net.append(ssid)
                get_oui(p.addr2)  
                checkmac(p.addr2) 
                print str(count)+'>',p.addr2+' ('+G+macf+W+') <--Probing--> '+O+ssid+W
                if args.log:
                    wr_log(p.addr2,ssid,macf)

Now heres what you guys need to know: this script is monitoring and analyzing wireless packets one by one from the air and from each packet I extract the ssid, mac address and manufacturer data. Clients are prone to sending loads of duplicate packets with same data and amongst them will be unique packets with unique data.

Current SSID is stored ssid and current MAC is stored in p.addr2. Previous ssid and p.addr2 values are stored in the lists - 'net' and 'clients', respectively.

For most of the packets my code survives and is valid but for one special condition I am lost. Consider these hypothetical values of ssid and mac addresses:

SSID  MAC
S1    A
S2    A
S1    B
S2    B

For the first scenario, the third condition holds true For the second scenario, the second condition holds true For the third scenrario, the first condition holds true For the fourth condition, none of the conditions hold true, am i right? Because when the lists are checked with the "not in" and "in" operands it finds both the client and the ssid are already there and hence drops the packet whereas this is a valid condition and means that basically 2 clients are looking for the same same ssid and I would want it to be printed. But if I do this:

elif ssid in net and p.addr2 in clients:
                get_oui(p.addr2)  
                checkmac(p.addr2)
                print str(count)+'>',p.addr2+' ('+G+macf+W+') <--Probing--> '+O+ssid+W

Duplicate packets start printing out because each client sends multiple packets with same data whereas this is situation can arise. How do I implement a condition so that I can validate such situations where multiple clients in the clients[] are looking for multiple ssid's in net[]???

According to cmidi's suggestion: I tried to use a dictionary and tried to access it this way, its still giving me duplicates!

            if count > 0:
                for k,v in obs.items():
                    if k and v != p.addr2 and ssid: 
                        count +=1
                        get_oui(p.addr2)
                        net.append(ssid)
                        checkmac(p.addr2)
                        obs[p.addr2] = ssid
                        mps+=1   
                        print str(count)+'>',p.addr2+' ('+G+macf+W+') <--Probing--> '+O+ssid+W+' < '+Y+'MPS'+W
                        if args.log:
                            wr_log(p.addr2,ssid,macf)
            else:
                count +=1
                get_oui(p.addr2)
                net.append(ssid)
                checkmac(p.addr2)
                obs[p.addr2] = ssid   
                print str(count)+'>',p.addr2+' ('+G+macf+W+') <--Probing--> '+O+ssid+W+' < '+Y+'MPS'+W
                if args.log:
                    wr_log(p.addr2,ssid,macf)

Whats going on here?

@lmz Ok so according to request this is how everything works or rather should work:

For each packet I scan, I obtain my SSID, MAC and Manufacturer values. I want to save this data, only the SSID and MAC, together because as a pair they will be unique after discarding the other duplicate packets. For the first packet, I print the values without any check and we have our first set of values, then from the second packet onwards the check for duplicates and certain conditions become active. Ideally to lessen code as much as possible, here there should be a direct check for the SSID:MAC pairs captured since the first packet in the list, dict, ordereddict etc. Here more than being ordered, they code needs to be able to scan through all previous pairs removing each pair that in the list, dict that does not match the incoming SSID:MAC. In case a duplicate is not found, the new SSID:MAC pair is added to the list, dict etc. otherwise if it finds a duplicate, the loop breaks and we move onto the third packet and so on.

  • Use hash tables or dictionaries,create keys from a crc or some hash function of ssid+mac – cmidi May 09 '15 at 00:25
  • @cmidi I have no idea about how to implement that as you would have guess from my post, I am not really experienced writing code. Can you please give me an example? I will look it up on google but a direct example while we are on the issue would be really helpful. – Siddharth Dubey May 09 '15 at 00:32
  • Sure as soon as i get near my workstation – cmidi May 09 '15 at 01:05
  • @cmidi read my edit above, I tried using a dictionary that way. – Siddharth Dubey May 09 '15 at 01:51
  • OK so basically given a list of possibly duplicate (SSID, MAC) just return a unique list of (SSID, MAC)? I'm not sure what you mean by "scan through all previous pairs removing each pair that in the list, dict that does not match the incoming SSID:MAC". If you did that wouldn't you end up with duplicates? – lmz May 09 '15 at 06:56
  • @lmz By writing that I meant that we remove it from our search scope when we search the remaining pairs. But yes, essentially what you understood from my explanation is correct, given a list of possibly duplicate (SSID, MAC), we just return only a unique list of (SSID, MAC). – Siddharth Dubey May 09 '15 at 07:00

1 Answers1

2

I'm giving an example with the dictionary so you can store associated data as well. If you want order just change the dict to collections.OrderedDict as shown. Since you say it's duplicates in and uniques out I am assuming that the MPM and MPS bits are useless. Let me know if you need them.

Example (this assumes you have string SSID & MAC):

from collections import OrderedDict
packets = [
    ('S1', 'A'), ('S2', 'A'), ('S1', 'B'),
    ('S2', 'B'), ('S1', 'A'), ('S1', 'B')
]



# seen_packets = OrderedDict() # if order is required
seen_packets = dict()

for ssid, mac in packets:
    print "Considering SSID ", ssid, " and MAC ", mac
    ssid_mac = (ssid, mac)
    if ssid_mac in seen_packets:
        print "Seen this before - not adding"
    else:
        data_for_packet = True # your own data here (timestamp?)
        print "Never seen this SSID/MAC combo before"
        seen_packets[ssid_mac] = data_for_packet

print "Unique ssid,mac pairs with data:"
for (ssid, mac), data in seen_packets.iteritems():
    print ssid, mac, data

The key is to understand that the tuple (ssid, mac) is a perfectly valid key for a dictionary if both ssid and mac are strings so there is no need to have multiple lists.

Community
  • 1
  • 1
lmz
  • 1,560
  • 1
  • 9
  • 19
  • and do sets store data in order i.e: in the same order I add them? If it does then can't I just store the SSID's and MAC's that I get in 'seen_ssid_macs' and then do: if (ssid,mac) not in seen_ssid_macs: print "Whatever I need to" ?? How do I store the values coming in the variables ssid and mac into 'packets' ? – Siddharth Dubey May 09 '15 at 04:30
  • Sets don't have order. There are dictionaries with order: [collections.OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict). The point of `if (ssid, mac) in seen_ssid_macs` is that it consideres the `(ssid, mac)` as a unit so it would return `False` if the SSID has been seen but with a different MAC before. I'll try and make an edit with OrderedDict. – lmz May 09 '15 at 04:55
  • I was looking at dicts with the intention that I can make dict values like ['A':'S1,S2,S3','B':'S2,S3'] and then be able to seach them each time I get a new packet from a device to see if a certain know MAC exists with a certain know SSID value and if such a MAC:SSID paid is not found, even after searching its own known values if any, print it and adds to to the dict. Do you want me post a bigger snippet from my script to show how I am getting the ssid and mac values? – Siddharth Dubey May 09 '15 at 05:08
  • Do you need ordering? I don't think showing how the ssid and mac is obtained is necessary but it would be nice if you showed what you actually want to do with the resulting data and why you distinguish between "seen SSID before", "seen MAC before" and "never seen both". – lmz May 09 '15 at 05:12
  • i added some more info in my edit about the entire process as I see it, is this helpful? – Siddharth Dubey May 09 '15 at 06:51
  • @SiddharthDubey OK. I simplified the code a bit since from your explanation the `MPM` and `MPS` parts are not needed, you only want a unique list output. – lmz May 09 '15 at 07:17