python required fields from input file

Question

I have an input file I am using for a python script.

Example of a file is here:

Name:       Joe
Surname:     Doe
Country:     DE
Gender:

Anybody would suggest how to parse the file and make sure that all required info is supplied? I am trying to avoid if/else statements and trying to implement in more efficient way!

Here is what I do but I am sure there is a better way.

for line in file_content:
      if re.match(r'Name\d+:\s+(\w+)', line, re.IGNORECASE):
         file_validation['name'] = True
      elif  re.match(r'Surname:\s+(\w+)', line, re.IGNORECASE):
         file_validation['surname'] = True
      ...

Any suggestions? ZDZ

What type of file is it? It doesn't really make sense to use regex in your example. You are better off parsing the file as yaml or json, then asserting that the required fields are included. — Lord Elrond, Oct 19 '20 at 13:00
thanks for tip Monica. Its text file but for now I will have to stick to it. — ZDZ, Oct 19 '20 at 13:03

score 0 · Answer 1 · answered Oct 19 '20 at 13:02

0

Something like this:

>>> re.match(r'^(.+)\s*:\s*(.*)$', 'Surname:     Doe').groups()                                
('Surname', 'Doe')

answered Oct 19 '20 at 13:02

pbacterio

1,094
6
12

Pitto · Answer 2 · 2020-10-19T13:16:45.923

0

I would like to suggest using csvreader because of its simplicity:

import csv

fields_to_validate = ["name", "surname", "country", "gender"]

with open('data.csv') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=':')
    for row in csv_reader:
        field_key = row[0].lower()
        field_value = row[1].strip()
        print("\n{} {}".format(field_key, field_value))
        if field_key in fields_to_validate and field_value:
            print("{} validated correctly!".format(field_key))
        else:
            print("{} NOT validated correctly.".format(field_key))

Output

name Joe
name validated correctly!

surname Doe
surname validated correctly!

country DE
country validated correctly!

gender
gender NOT validated correctly.

edited Oct 19 '20 at 13:16

answered Oct 19 '20 at 13:06

Pitto

8,229
3
42
51

1

Nice tip Pitto. I was not even thinking about csv module! – ZDZ Oct 19 '20 at 13:20
If it was useful for you please consider upvoting it and / or choosing it as final answer. Thanks for your time! – Pitto Mar 26 '21 at 09:20

score 0 · Answer 3 · answered Oct 19 '20 at 13:11

Firstly, you should parse using regex and construct a dict from the file. The regex we'll be using is-

^(\w+):\s+(\w+)$

This will only select combinations of key and values. So it will not match Gender: since it is empty.

Check out the demo

Now we just have to construct a corresponding dictionary

# File contents
content = '''Name:       Joe
Surname:     Doe
Country:     DE
Gender:   
'''
data = {k:v for k, v in re.findall(r'(\w+):\s+(\w+)', content, re.M)}

Now if you look at data, it should look like-

>>> data
{'Name': 'Joe', 'Surname': 'Doe', 'Country': 'DE'}

Now all you have to do, is verify all the required fields exist in data.keys()

Initialize the required fields

required_fields = {'Name', 'Surname', 'Country', 'Gender'}

Check if required_fields is a subset of data.keys() - if you want to allow extra keys in input, or, use == if you want only valid keys to exist in data.keys().

>>> set.issubset(required_fields, set(data.keys()))
False
>>> data.keys() == required_fields
False

Let's try the same thing with valid data-

# File contents
content = '''Name:       Joe
Surname:     Doe
Country:     DE
Gender:     Male'''
required_fields = {'Name', 'Surname', 'Country', 'Gender'}

data = {k:v for k, v in re.findall(r'(\w+):\s+(\w+)', content, re.M)}
print(data.keys() == required_fields)    # True
print(set.issubset(required_fields, set(data.keys())))   # True

Output-

True
True

python required fields from input file

3 Answers3