0

I have a text file of some ip's and Mac's. The format of the Mac's are xxxx.xxxx.xxxx, I need to change all the MAC's to xx:xx:xx:xx:xx:xx I am already reading the file and putting it into a list. Now I am looping through each line of the list and I need to make multiple modification. I need to remove the IP's and then change the MAC format. The problem I am running into is that I cant seem to figure out how to do this in one shot unless I copy the list to a newlist for every modification. How can I loop through the list once, and update each element on the list with all my modification?

count = 0
output3 = []

for line in output:

    
    #print(line)
    #removes any extra spaces between words in a string.
    output[count] = (str(" ".join(line.split())))

    #create a new list with just the MAC addresses
    output3.append(str(output[count].split(" ")[3]))

   #create a new list with MAC's using a ":"

    count += 1    

print(output3)
CypherX
  • 7,019
  • 3
  • 25
  • 37
NerdGuy021
  • 59
  • 2
  • 6

2 Answers2

0

Solution

You can use the regex (regular expression) module to extract any pattern that matches that of the mac-ids: "xxxx:xxxx:xxxx" and then process it to produce the expected output ("xx-xx-xx-xx-xx-xx") as shown below.

Note: I have used a dummy data file (see section: Dummy Data below) to make this answer reproducible. It should work with your data as well.

# import re

filepath = "input.txt"
content = read_file(filepath)
mac_ids = extract_mac_ids(content, format=True) # format=False --> "xxxx:xxxx:xxxx"

print(mac_ids)

## OUTPUT:
#
# ['a0-b1-ff-33-ac-d5',
#  '11-b9-33-df-55-f6',
#  'a4-d1-e7-33-ff-55',
#  '66-a1-b2-f3-b9-c5']

Code: Convenience Functions

How does the regex work? see this example

def read_file(filepath: str):
    """Reads and returns the content of a file."""

    with open(filepath, "r") as f:
        content = f.read() # read in one attemp
    return content

def format_mac_id(mac_id: str):
    """Returns a formatted mac_id.
    INPUT FORMAT: "xxxxxxxxxxxx"
    OUTPUT FORMAT: "xx-xx-xx-xx-xx-xx"
    """
    
    mac_id = list(mac_id)
    mac_id = ''.join([ f"-{v}" if (i % 2 == 0) else v for i, v in enumerate(mac_id)])[1:]
    return mac_id

def extract_mac_ids(content: str, format: bool=True):
    """Extracts and returns a list of formatted mac_ids after.
    INPUT FORMAT: "xxxx:xxxx:xxxx"
    OUTPUT FORMAT: "xx-xx-xx-xx-xx-xx"
    """
    
    import re
    # pattern = "(" + ':'.join([r"\w{4}"]*3) + "|" + ':'.join([r"\w{2}"]*6) + ")"
    # pattern = r"(\w{4}:\w{4}:\w{4}|\w{2}:\w{2}:\w{2}:\w{2}:\w{2}:\w{2})"
    pattern = r"(\w{4}:\w{4}:\w{4})"
    pat = re.compile(pattern)
    mac_ids = pat.findall(content) # returns a list of all mac-ids
    # Replaces the ":" with "" and then formats 
    # each mac-id as: "xx-xx-xx-xx-xx-xx"
    if format:
        mac_ids = [format_mac_id(mac_id.replace(":", "")) for mac_id in mac_ids]
    return mac_ids

Dummy Data

The following code block creates a dummy file with some sample mac-ids.

filepath = "input.txt"

s = """
a0b1:ff33:acd5 ghwvauguvwi ybvakvi
klasilvavh; 11b9:33df:55f6
haliviv
a4d1:e733:ff55
66a1:b2f3:b9c5
"""

# Create dummy data file
with open(filepath, "w") as f:
    f.write(s)
CypherX
  • 7,019
  • 3
  • 25
  • 37
  • @zerobit Please try this and let me know if you have any questions. – CypherX Mar 14 '21 at 03:57
  • mac addresses are hex, so you need more than `\d` – Sean Summers Mar 14 '21 at 04:55
  • If I understood him correctly, his mac-ids are stored at strings. If that's not the case, I would need him to share some sample data (which I asked him to provide already). – CypherX Mar 14 '21 at 05:17
  • @SeanSummers Yes, for hex, it should take care of both digits and letters. Need to update the regex pattern for that. – CypherX Mar 14 '21 at 05:18
  • @SeanSummers See this: [Tested solution](https://github.com/sugatoray/stackoverflow/blob/master/src/answers/Q_66620236/Q_66620236.ipynb) – CypherX Mar 14 '21 at 05:50
  • Yes, now that you changed `\d` to `\w` it will work. I don't expect the input to have corrupt hex digits, but you are accepting over 55% invalid hex characters with `\w` – Sean Summers Mar 14 '21 at 05:55
  • Yes, but I expect the OP will take care of providing only valid mac-ids. If he/she needs the validity of the mac-ids, then that is another issue and that should be kept separate from this problem-statement. – CypherX Mar 14 '21 at 06:02
  • I agree with you on problem scope, but we _are_ talking about an IEEE defined standard data type. I'm not sure `g-z' and '_' would ever even be considered surrogates to valid values for IEEE Std 802 MAC-48 defined values. But as a pure math person, I hear you. Units are immaterial to the problem. – Sean Summers Mar 14 '21 at 06:06
0

It appears you are trying to overthink the problem, so that may be where your frustration is spinning you around a bit.

First, you should always consider if you need a count variable in python. Usually you do not, and the enumerate() function is your friend here.

Second, there is no need to process data multiple times in python. You can use variables to your advantage and leverage python's expressiveness, rather than trying to hide your problem from the language.

PSA an implementation example that may help you think through your approach. Good luck on solving your harder problems, and I hope python will help you out with them!

#! /usr/bin/env python3

import re

from typing import Iterable


# non-regex reformat mac to be xx:xx:xx:xx:xx:xx
# NOTE: this assumes a source with '.' separators only
# reformat_mac = lambda _: ':'.join(_ for _ in _.split('.') for _ in (_[:2], _[2:]))

# regex reformat mac to be xx:xx:xx:xx:xx:xx
# NOTE: Only requires at least two hex digits adjacent at a time
reformat_mac = lambda _: ":".join(re.findall(r"(?i)[\da-f]{2}", _))


def generate_output3(output: Iterable[str]) -> Iterable[str]:
    for line in output:
        col1, col2, col3, mac, *cols = line.split()
        mac = reformat_mac(mac)
        yield " ".join((col1, col2, col3, mac, *cols))


if __name__ == "__main__":
    output = [
        "abc  def ghi 1122.3344.5566",
        "jklmn op qrst 11a2.33c4.55f6 uv wx     yz",
        "zyxwu 123 next 11a2.33c4.55f6 uv wx     yz",
    ]
    for line in generate_output3(output):
        print(line)
Sean Summers
  • 2,514
  • 19
  • 26