How to Run Python Script (Scrapy) From Ktor

Question

What I'm trying to do:

Android Application (ADMIN) that gets job Title from user and fetches all the jobs related to it using Scrapy (Python) which are saved to database through API.
Android Application (CLIENT) that fetches all the data from database through API.

Problem: I'm stuck on how to connect my python Script with Ktor.

Some Code for more clarity:

import spiders.Linkedin as linkedinSpider

linkedinSpider.main(numberOfPages=1, keywords="ios developer", location="US")

This piece of Code works fine and helps me to fetch and store data into database and returns True if saved successfully else False. I just need to call this function and all the work is done for me.

Similarly,

This is how i get data from the user through API from ADMIN Side using KTOR. Also this works fine.

            // Fetchs New Jobs.
            get("/fetch") {
                val numberOfPages = call.request.queryParameters["pages"]?.toInt() ?: 1
                val keyword = call.request.queryParameters["keyword"]
                val location = call.request.queryParameters["location"]

                if (numberOfPages < 1) {
                    call.respond(
                        status = HttpStatusCode.BadRequest,
                        message = Message(message = "Incorrect Page Number.")
                    )
                } else {
                    call.respond(
                        status = HttpStatusCode.PK,
                        message = Message(message = "Process Completed.")
                    )
                }
            }

This is how i wanted the code to work logically.

      // Fetch New Jobs.
            get("/fetch") {
                val numberOfPages = call.request.queryParameters["pages"]?.toInt() ?: 1
                val keyword = call.request.queryParameters["keyword"]
                val location = call.request.queryParameters["location"]

                if (numberOfPages < 1) {
                    call.respond(
                        status = HttpStatusCode.BadRequest,
                        message = Message(message = "Incorrect Page Number.")
                    )
                } else {

                    // TODO: Call Python Function Here and Check the Return Type. Something Like This.
                    if(linkedinSpider.main(numberOfPages=numberOfPages, keywords=keyword, location=location)) {
                        call.respond(
                            status = HttpStatusCode.OK,
                            message = Message(message = "Process Completed.")
                        )
                    } else {
                        call.respond(
                            status = HttpStatusCode.InternalServerError,
                            message = Message(message = "Something Went Wrong.")
                        )
                    }
                }
            }

Also these are two different individual projects. Do i need to merge them or something as when i tried it says no python interpreter found in intelliJ which i'm using as an IDE for Ktor Development. Also I tried to configure python interpreter but it seems of no use as it is not able to reference the variables or the python files.

Edit 1

This is what i tried

    val processBuilder = ProcessBuilder(
        "python3", "main.py"
    )

    val exitCode = processBuilder.start().waitFor()

    if (exitCode == 0) {
        println("Process Completed.")
    } else {
        println("Something Went Wrong.")
    }
}

The exitCode i am getting is 2. But when i run the below code it works with exitCode 0.

ProcessBuilder("python3", "--version")

Edit 2

After This the code is working but not terminating and i get no output

This is the kotlin file i made to simulate the problem

package com.bhardwaj.routes

import java.io.File


fun main() {
    val processBuilder = ProcessBuilder(
        "python3", "main.py", "1", "android", "india"
    )
    processBuilder.directory(File("src/main/kotlin/com/bhardwaj/"))
    val process = processBuilder.start()
    val exitCode = process.waitFor()

    if (exitCode == 0) {
        val output = String(process.inputStream.readBytes())
        print("Process Completed -> $output")
    } else {
        val output = String(process.errorStream.readBytes())
        print("Something went wrong -> $output")
    }
}

If i run same commands in terminal it works.

python3 src/main/kotlin/com/bhardwaj/main.py 1 "android" "india"

After i run same commands in the Terminal the output is ->

Edit 3

When I stopped default Scrapy logging, the code worked. It seems there is some limit to output stream in Process Builder. Here is what i did.

def __init__(self, number_of_pages=1, keywords="", location="", **kwargs):
        super().__init__(**kwargs)
        logging.getLogger('scrapy').setLevel(logging.WARNING)

This worked flawlessly in Development mode. But there is something more i want to discuss. When I deployed the same on Heroku and made a get request that gives me 503 service unavailable after 30-40 sec later. But after another 20-30 sec i get my data into database. It's seems to be very confusing why this is happening.

For the flow of the program and clarity - When i make get request to ktor, it request scrapy(A python program) to scrape data and store it in JSON file. After all the process done it make a request to another endpoint to store all the data to database and after all this it returns and respond the user with the particular status code it should give as output using call.respond in ktor.

Is the issue is due to another request made by the python program to the endpoint or it is related to heroku that we can handle one process at a time. Because there is no issue on development mode i.e. localhost url

You can find information about how to call python scripts from Java (this will work for Kotlin too) here https://www.baeldung.com/java-working-with-python but I recommend finding a way to fetch data using some Java or Kotlin library. — Aleksei Tirman, Feb 20 '22 at 22:15
Hey @AlekseiTirman I tried using ProcessBuilder but there seems to be some problem. I have updated the question. Where am i wrong? — Adi, Feb 21 '22 at 09:30
Most likely the problem is that file `main.py` cannot be found in the current working directory, see https://stackoverflow.com/a/6758687/13963150. Try to specify an absolute path to your script. — Aleksei Tirman, Feb 21 '22 at 10:46
@AlekseiTirman Yes after specifying path `processBuilder.directory(File("src/main/kotlin/com/bhardwaj/"))` it is working but there is some problem as when the code executes it goes forever and is stuck. I get no output which is an .json file. It shows 4-5 elements and then it stucks but if i run through terminal it is working. I will share the outputs in the question above. Please have a look. — Adi, Feb 21 '22 at 12:39
What does the file `main.py` contain? Does it work if you try to only put `print` statements in your python script? — Aleksei Tirman, Feb 21 '22 at 13:21
@AlekseiTirman Yes it works fine if i try to put `print` statements. The `main.py` calls **spider** that scraps data and stores them in **json file** and after successfully scraping all the page. It calls an **API** to store all the data to database. — Adi, Feb 21 '22 at 13:31
Probably it hangs because you don't consume error and input streams https://stackoverflow.com/a/36766930/13963150. — Aleksei Tirman, Feb 21 '22 at 13:53
@AlekseiTirman Sorry, i'm not sure about what that forum says but if it is that i need to use them necessarily, i have used both of them in `exitCode` condition before printing them. But as far i have experienced it stores all the output till the process doesn't complete, that is why i don't get any output. — Adi, Feb 21 '22 at 14:30
Hey @AlekseiTirman we have founded the solution to the problem. There is something more in which i need some help. I have made **edits** to the question. Please checkout **Edit 3** above. — Adi, Feb 22 '22 at 10:28

score 0 · Accepted Answer · answered May 10 '22 at 04:35

When I stopped default Scrapy logging, the code worked. It seems there is some limit to output stream in Process Builder. Here is what i did.

def __init__(self, **kwargs):
    super().__init__(**kwargs)
    logging.getLogger('scrapy').setLevel(logging.WARNING)

How to Run Python Script (Scrapy) From Ktor

1 Answers1