0

I am using python to extract data from a html document with the help of the BeautifulSoup4 package, then I need the extracted data in my c# program.

In python I am using the following function to get all the text from all the tags in the html file:

extractedText=""

def getText(detail):
    words=""
    for object in detail:
        if type(object) is bs4.element.Tag:
            text=getText(object.contents)
            words+=str(text)
        if type(object) is bs4.element.NavigableString:
            words+=str(object)
    return words

for detail in details:
    extractedText+=getText(detail)

print(extractedText)

which utilizes recursion to strip away the tags. When it is run in python, everything works perfectly, however if i launch the same program in c# using this function:

        static string getTextPy()
        {
            var procInf = new ProcessStartInfo();
            procInf.FileName = @"C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe";
            var script = @"C:\Users\User\Desktop\test2.py";
            var variable = @"C:\Users\User\Desktop\document.html";

            procInf.Arguments=$"\"{script}\" \"{variable}\"";

            procInf.UseShellExecute = false;
            procInf.CreateNoWindow = true;
            procInf.RedirectStandardOutput= true;
            procInf.RedirectStandardError = true;

            var errors = "";
            var results = "";
            using (var process = Process.Start(procInf))
            {
                errors = process.StandardError.ReadToEnd();
                results = process.StandardOutput.ReadToEnd();
            }

            if (errors!="")
            {
                return errors;
            }
            return results;
        }

        Console.WriteLine(getTextPy());

the whole program halts at the

    errors = process.StandardError.ReadToEnd();
    results = process.StandardOutput.ReadToEnd();

part.

I have tried removing the recursive part from the python code, and then everything works perfectly. I waited for almost 5 minutes for it to return something, and it just doesn't. I don't think the problem lies in the recursion itself, as this test code in python:

def hundred(num):
    num+=1
    if num!=100:
        num=hundred(num)
    return num
print(hundred(0))

works perfectly well through c#.

Sorry if this is me missing something stupid, im new to python and prefer c#. If someone could explain to me what may be the cause of the halt i would be very greatful. I can't seem to find any sound explanation of the problem online.

0 Answers0