-1

I have an input file I am using for a python script.

Example of a file is here:

Name:       Joe
Surname:     Doe
Country:     DE
Gender:    

Anybody would suggest how to parse the file and make sure that all required info is supplied? I am trying to avoid if/else statements and trying to implement in more efficient way!

Here is what I do but I am sure there is a better way.

for line in file_content:
      if re.match(r'Name\d+:\s+(\w+)', line, re.IGNORECASE):
         file_validation['name'] = True
      elif  re.match(r'Surname:\s+(\w+)', line, re.IGNORECASE):
         file_validation['surname'] = True
      ...

Any suggestions? ZDZ

napuzba
  • 6,033
  • 3
  • 21
  • 32
ZDZ
  • 41
  • 6

3 Answers3

0

Something like this:

>>> re.match(r'^(.+)\s*:\s*(.*)$', 'Surname:     Doe').groups()                                
('Surname', 'Doe')
pbacterio
  • 1,094
  • 6
  • 12
0

I would like to suggest using csvreader because of its simplicity:

import csv

fields_to_validate = ["name", "surname", "country", "gender"]

with open('data.csv') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=':')
    for row in csv_reader:
        field_key = row[0].lower()
        field_value = row[1].strip()
        print("\n{} {}".format(field_key, field_value))
        if field_key in fields_to_validate and field_value:
            print("{} validated correctly!".format(field_key))
        else:
            print("{} NOT validated correctly.".format(field_key))

Output

name Joe
name validated correctly!

surname Doe
surname validated correctly!

country DE
country validated correctly!

gender
gender NOT validated correctly.
Pitto
  • 8,229
  • 3
  • 42
  • 51
  • 1
    Nice tip Pitto. I was not even thinking about csv module! – ZDZ Oct 19 '20 at 13:20
  • If it was useful for you please consider upvoting it and / or choosing it as final answer. Thanks for your time! – Pitto Mar 26 '21 at 09:20
0

Firstly, you should parse using regex and construct a dict from the file. The regex we'll be using is-

^(\w+):\s+(\w+)$

This will only select combinations of key and values. So it will not match Gender: since it is empty.

Check out the demo

Now we just have to construct a corresponding dictionary

# File contents
content = '''Name:       Joe
Surname:     Doe
Country:     DE
Gender:   
'''
data = {k:v for k, v in re.findall(r'(\w+):\s+(\w+)', content, re.M)}

Now if you look at data, it should look like-

>>> data
{'Name': 'Joe', 'Surname': 'Doe', 'Country': 'DE'}

Now all you have to do, is verify all the required fields exist in data.keys()

Initialize the required fields

required_fields = {'Name', 'Surname', 'Country', 'Gender'}

Check if required_fields is a subset of data.keys() - if you want to allow extra keys in input, or, use == if you want only valid keys to exist in data.keys().

>>> set.issubset(required_fields, set(data.keys()))
False
>>> data.keys() == required_fields
False

Let's try the same thing with valid data-

# File contents
content = '''Name:       Joe
Surname:     Doe
Country:     DE
Gender:     Male'''
required_fields = {'Name', 'Surname', 'Country', 'Gender'}

data = {k:v for k, v in re.findall(r'(\w+):\s+(\w+)', content, re.M)}
print(data.keys() == required_fields)    # True
print(set.issubset(required_fields, set(data.keys())))   # True

Output-

True
True
Chase
  • 5,315
  • 2
  • 15
  • 41