3

I am reading a file using:

def readFile():
    file = open('Rules.txt', 'r')
    lines = file.readlines()
    for line in lines:
        rulesList.append(line)

rulesList:

['\n', "Rule(F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443)\n", '\n', "Rule(F2, HTTPS TCP, ['ip', 'ip'], ['75.2.18.233'], 443)\n", '\n']

My file looks like:

Rule(F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443)

Rule(F2, HTTPS TCP, ['ip', 'ip'], ['ip'], 443)

I would like to feed the values to a class I created

class Rule:
    def __init__(self, flowNumber, protocol, port, fromIP=[], toIP=[]):
        self.flowNumber = flowNumber
        self.protocol = protocol
        self.port = port
        self.fromIP = fromIP
        self.toIP = toIP

    def __repr__(self):
        return f'\nRule({self.flowNumber}, {self.protocol}, {self.fromIP}, {self.toIP}, {self.port})'

 newRule = Rule(currentFlowNum, currentProtocol, currentPort, currentFromIP, currentToIP)

to get an output such as:

[F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443] 

or be able to assign these values to a variable like:

currentFlowNum = F1, currentProtocol = 'HTTPS TCP' , currentPort = 443, currentFromIP = ['ip', 'ip'], currentToIP = ['www.google.ca', '8.8.8.8']

I tried:

for rule in rulesList:
        if rule !='\n':
            tmp = rule.split(',')
            print(tmp)

tmp:

['Rule(F1', ' HTTPS TCP', " ['ip'", " 'ip']", " ['www.google.ca'", " '8.8.8.8']", ' 443)\n']
['Rule(F2', ' HTTPS TCP', " ['ip'", " 'ip']", " ['ip']", ' 443)\n']

Is there a way to not split the commas between [] i.e. I would like the output to look like:

['Rule(F1', ' HTTPS TCP', " ['ip','ip']", " ['www.google.ca', '8.8.8.8']", ' 443)\n']
['Rule(F2', ' HTTPS TCP', " ['ip','ip']", " ['ip']", ' 443)\n']

  • 1
    There is and it requires you to write your own parser. What would be more better imo is using a standard format like json or yaml – Vulwsztyn Dec 20 '22 at 16:42
  • @Vulwsztyn do you mean converting the array to json before saving into a file? – ritvik seth Dec 20 '22 at 16:43
  • 1
    You could probably used named regex groups to parse this, perhaps capturing the lists and doing further processing on them afterwards. See https://stackoverflow.com/q/10059673/2958070 for a simple example of named regex groups. Also note, you don't want your `Rule` class to use `[]` as a default argument in `fromIP` and `toIP`. See https://stackoverflow.com/q/366422/2958070 about that – Ben Dec 20 '22 at 16:44
  • 1
    @ritvikseth if converting to json before you save the file is an option then emphatically yes, that is many orders of magnitude better than trying to parse this format – Dean MacGregor Dec 20 '22 at 16:44
  • @Ben thank you for your comment I will check the links – ritvik seth Dec 20 '22 at 16:44
  • 1
    @DeanMacGregor yes that is an option for me I will try that thank you – ritvik seth Dec 20 '22 at 16:45
  • @Swifty if the json idea does not work I will try regex tbh I am very bad at regex lol, thank you for your comment – ritvik seth Dec 20 '22 at 16:47
  • 1
    I removed my comment because your rules already are separated elements of the list; I'll rethink it later :). But yeah, saving as json or csv (separated by ; instead of ,) would makes things simpler. – Swifty Dec 20 '22 at 16:49
  • 1
    BTW, there's no need for the `for line in lines:` loop. Just do `rulesList = file.readlines()` – Barmar Dec 20 '22 at 16:50

1 Answers1

5

If you have control over how the data in the file is stored and can replace the single quotes (') with double quotes (") to make the "list" structures valid JSON, you could use RegExp for this.

A word of caution: unless you are absolutely sure that the format you'll be reading will largely remain the same and is rather inflexible, you're better off storing this data in a more well-established format (as mentioned in the comments) like JSON, YAML, etc. There are so many edge cases that could happen here that rolling your own parser like this objectively suboptimal.

import re
import json

def readFile():
    file = open('Rules.txt', 'r')
    myRules = []
    for line in file.readlines():
        match = re.match(r'Rule\((?P<flow_number>[^,]+),\s(?P<protocol>[^,]+),\s(?P<from_ip>\[[^\]]+\]),\s(?P<to_ip>\[[^\]]+\]),\s(?P<port>[^,)]+)\)', line)
        if match:
          myRules.append(Rule(match.group('flow_number'), match.group('protocol'), match.group('port'), json.loads(match.group('from_ip')), json.loads(match.group('to_ip'))))

    return myRules


print(readFile())
# Returns:
# [
#  Rule(F1, HTTPS TCP, ['ip', 'ip'], ['www.google.ca', '8.8.8.8'], 443), 
#  Rule(F2, HTTPS TCP, ['ip', 'ip'], ['ip'], 443)]

Repl.it | Regex101

esqew
  • 42,425
  • 27
  • 92
  • 132