1

I have a program (file1.py) with functions and I want to test these functions from the file test1.py. When I import the first function I don't know why the terminal tells me that I need to introduce the arguments that are required when I run file1.py. Is beyond my understanding why this happens because as far as I know from test1.py I am taking the first function and not the complete file1.py.

file1.py (until the first function)

import os
import argparse
import pandas as pd
import numpy as np

# Enter the path/file names

parser = argparse.ArgumentParser()
parser.add_argument('--vcf1', type=str, required=True)
parser.add_argument('--vcf2', type=str, required=True)
args = parser.parse_args()
NAME_FILE_1 = args.vcf1
NAME_FILE_2 = args.vcf2


def load_sample (Name_file):
    '''
    Take the header of the body of the CSV file
    '''
    with open(Name_file, 'r') as f:
        for line in f:
            if line.startswith('#') and len(line)>2 and line[1] != '#':
                columns = line[1:-1].split('\t')
                data = pd.read_csv(Name_file, comment='#', delimiter='\t', names=columns)
                break
    return data

# The data of the VCF is here
dataA = load_sample (NAME_FILE_1)
dataB = load_sample (NAME_FILE_2)

And my test1.py

import os

import pandas as pd
import numpy as np

from VCF_matcher.app.run import load_sample


NAME_FILE_1 = "./test_sample.vcf"

# FIRST TEST

def test_load_sample():
    '''Verify all rows of the body of the vcf file is taken'''
    data_to_test = load_sample (NAME_FILE_1)
    assert len(data_to_test) == 10425

The output:

======================================================== ERRORS ========================================================
_________________________________________ ERROR collecting test_vcf_matcher.py _________________________________________
test_vcf_matcher.py:13: in <module>
    from VCF_matcher.app.run import load_sample
../app/run.py:26: in <module>
    args = parser.parse_args()
../../../opt/anaconda3/lib/python3.8/argparse.py:1768: in parse_args
    args, argv = self.parse_known_args(args, namespace)
../../../opt/anaconda3/lib/python3.8/argparse.py:1800: in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
../../../opt/anaconda3/lib/python3.8/argparse.py:2034: in _parse_known_args
    self.error(_('the following arguments are required: %s') %
../../../opt/anaconda3/lib/python3.8/argparse.py:2521: in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
../../../opt/anaconda3/lib/python3.8/argparse.py:2508: in exit
    _sys.exit(status)
E   SystemExit: 2
--------------------------------------------------- Captured stderr ----------------------------------------------------
usage: pytest [-h] --vcf1 VCF1 --vcf2 VCF2
pytest: error: the following arguments are required: --vcf1, --vcf2
=============================================== short test summary info ================================================
ERROR test_vcf_matcher.py - SystemExit: 2
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

  • 1
    You have defined 'required' arguments for your `file1.py` so if you want to use the file, you need to specify those. The function "lives" inside the file so you cannot use "only" a function. – po.pe Sep 14 '21 at 12:16
  • 2
    You should use `if __name__ == '__main__':` construct, so the argument parsing is not executed when your itenntion is to import just a single function. See https://stackoverflow.com/questions/419163/what-does-if-name-main-do – Code Painters Sep 14 '21 at 12:17

1 Answers1

2

You have to structure file1.py as follows if you don't want to run the "main" part every time you import this file from some other Python file:

import os
import argparse
import pandas as pd
import numpy as np


def load_sample (Name_file):
    '''
    Take the header of the body of the CSV file
    '''
    with open(Name_file, 'r') as f:
        for line in f:
            if line.startswith('#') and len(line)>2 and line[1] != '#':
                columns = line[1:-1].split('\t')
                data = pd.read_csv(Name_file, comment='#', delimiter='\t', names=columns)
                break
    return data


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--vcf1', type=str, required=True)
    parser.add_argument('--vcf2', type=str, required=True)
    args = parser.parse_args()
    NAME_FILE_1 = args.vcf1
    NAME_FILE_2 = args.vcf2
    
    dataA = load_sample(NAME_FILE_1)
    dataB = load_sample(NAME_FILE_2)

For a better explanation, see.

bugra
  • 129
  • 3