0

This is probably going to be a long question so apologies in advance.

Here's my situation, I'm hoping you guys can put me on the right track:

Senario: I have 2 main files, let's call them app-prod.py and app-dev.py. Both (programs?) files use Selenium to scrape websites, 1 is the main site (work) and the other is the dev site (local). app-prod.py needs some login code (username, password and auth code) and it has to tell Selenium to click on certain buttons that are only on the production site.

app-dev.py doesn't need the authentication or the instructions to navigate because the sites are not the same. The dev site is just some html tables that mimic the prod site but without all the login stuff.

Example Of Production Code:

# app-prod.py

"""open browser to base url, login and get ready
logic to login and go to the right page goes here"""



base_url = config[env]["base_url"]

browser = Firefox()

browser.get(base_url)

print("Sleeping for 5 to let page load")
time.sleep(5)

# LOGIN SCRIPT

auth = input("Enter Auth Code: ")
username = "LETMEIN"
password = "P@55W@rD"

browser.find_element_by_xpath('//*[@id="username"]').send_keys(username)
browser.find_element_by_xpath('//*[@id="password_input"]').send_keys(password)
browser.find_element_by_xpath('//*[@id="secondary_password_input"]').send_keys(auth)
browser.find_element_by_xpath('//input[@name="Login"]').click()

print("Sleeping for 3 to let page load")
time.sleep(3)

# Click OK button on alert
browser.find_element_by_xpath('//input[@value="Continue"]')

class Table:
    def __init__(self, vls, name, intro):
**********

Example Of Dev Code:

# app-dev.py

"""open browser to base url, login and get ready
logic to login and go to the right page goes here"""


base_url = config[env]["base_url"]

browser = Firefox()

browser.get(base_url)


class Table:
    def __init__(self, vls, name, intro):
**********

My problem is keeping track of 2 different files. If I work on the dev code, I then have to add every change to the prod file while making sure not to mess up the login code portion. I'm currently using VS Code to compare the two files and then copy/paste the differences from dev to prod, which is working but kinda tedious.

I tried making the login section of code modular and importing it but it complains that "browser" is undefined (because the webdriver is being initialized in the app-*.py file).

I thought that when you imported a module, it would take everything in "import_me.py" and drop it into "app-dev.py", so that any code in the module could use any of the imports in the main app file but it doesn't seem to be working that way. It's like as if python runs the module in its own little section as a stand alone file, without it interacting with the main file.

So, my next thought was to see if there's a way to template app-dev.py so I could say:

here is some dev code blah blah blah. 
env='DEV'
if env == 'DEV'
    {{ include login code here that is imported from import_me.py}}
else:
    pass

But there doesn't seem to be such a thing. I can't keep hitting the production site every time I want to test something and there doesn't seem to be a clean way to handle two different files apart from comparing and manually making the changes.

I'm using configparser to change the urls and file paths but I don't think it can handle blocks of code, at least i haven't seen it in the docs anywhere.

How do you guys get around problems like this? I haven't been programming very long so its possible I missed a package that will be the fix for this but for right now, I'm out of ideas.

Any help you can give me is greatly appreciated and hopefully will make me a better programmer!

If you made it this far, thank you! Have a virtual cookie :)

TL;DR: Need a way to include a block of code from file 1, into a specific area of file 2 with file 1 sharing all of the imports of file 2 as if it was hard coded there at runtime.

**** EDIT FOR MORE INFO **** I'll see about posting code tomorrow as I don't have it in front of me right now. Here's what I'm doing: Setup Selenium to open a webpage, login (in the prod code) and then get ready to do the scraping.

Grab excel file and get ID number and Name, put that in a for loop for each person listed. Then for each person, send the ID number and name to a class I set up that will scrape the tables from the site.

The class has 5 sections, 1 section for each table. Each table has a different url which looks like {base_url}/{path_to_table}/{ID} (simplified from memory). It then tells the webdriver to go to that URL, scrape it and send that data to docx-tpl. It them opens a docx template file I have and then generates a new file based on the date it was fed.

That is the basis of the loop/class. The problem is that the URL's are different between prod and dev, the login is different (non-existant in dev) and I'm sure there's a few other differences that I'm forgetting.

I'll see if I can get some code posted online tomorrow that will help you guys see how many mistakes I'm making :D

Neil
  • 42
  • 1
  • 9
  • so the only difference between both of the script is login part and the rest of the scripts and code modules work the same as in both environments? – sahasrara62 Jun 20 '20 at 22:12
  • No, that was just 1 example. That is a chunk of it but there's a few other blocks that would be different, too. I just figure that if I can do it for the login block, I could use the same idea for the rest of it. I can see this program growing in the future and I don;t want to have to deal with thousands of lines of code using VS Code's compare lol – Neil Jun 20 '20 at 22:17
  • this is all depend on your application design, just a suggestion keep application core component in same file and rest of env specific code in each env file then it helps you, may be trying to make your application schema/flow diagram it will help you a lot. – sahasrara62 Jun 20 '20 at 22:22
  • That's a point, maybe I should re-write so that I can do that. How would you handle situations where you can't though? For example, main app needs to use the webdriver and so does a function that you have as part of a different imported module? – Neil Jun 20 '20 at 22:29
  • I fail to see why you would need to ever compare code, if you write the code in a modular way. But it's hard to give specific advice without seeing your code. You mention your code is modular, but I think you probably want to 1) think through about what exactly is common to prod/dev (i.e. the actual scraping of the data) as opposed to what's not (downloading the data). And 2) use class inheritance to capture the common parts while allowing you to look at differences. I am happy to look at your code to see how I would improve it. – joelhoro Jun 20 '20 at 22:32
  • [threading](https://stackoverflow.com/questions/33741921/can-i-run-multiple-instances-at-oncesimultaneously-with-selenium-webdriver), agreeing with @joelhoro, i would suggest you to make a schema or diagram of your application and the accordingly check what a particular section of code is doing, and accordingly partition your application into a different section like download, check links, error handling, etc and then proceed further, better to read books on design pattern, a web application for more productive – sahasrara62 Jun 20 '20 at 22:43
  • I been programming for roughly 6 weeks so when I say modular, I mean "as modular as I can make my current code". Maybe I should have added more details to the question. I will see if I can share the code tomorrow to better outline what I'm doing but because its work related, I can't just post the whole thing unfortunately. Maybe I can modify it a bit though. No doubt there are improvements, I only just learned about classes and oop so things are half and half at this point lol – Neil Jun 20 '20 at 22:45
  • Fair enough. I've seen people with years of experience coding (but not real developers, more like data scientists) who will fail at writing things in a modular way, so most likely the definition is very relative. Sometimes it's out of laziness, but often it's just because people are unaware of very simple design tricks (such as using dictionaries instead of if/elif statements, using global variables for app-wide settings instead of local variables, or, worse, hardcoded values, using lambdas as arguments to functions (i.e. higher order functions) and so forth. Feel free to share what you want. – joelhoro Jun 20 '20 at 22:50
  • And about the 'I just can't post the whole thing' - the easiest and very common way for people to do so is to have the code on github and give public access, or selective private access. This helps people make changes to your code, which may be faster than having them explain what you should do. – joelhoro Jun 20 '20 at 22:52
  • I was thinking of Git but what I mean't was I can't post it as is because of the URLS and login info that is used. I guess I can post the dev and prod files but remove the URLS from the ini file. You'll see the differences between the two but the production one just wont run.. I'll post that up tomorrow and leave a link on here. Thanks for your help so far though :) (I supposed it would be ;sensitive information'?) – Neil Jun 20 '20 at 23:01
  • This is a bit of a meta discussion, but I think that sensitive information should not be in git - instead it should just be saved in environment variables on the machines you're deploying to. In my case that would mean they're visible if I give access to my heroku config (which I would usually not do), but not if give access to my github repo (which I often do). – joelhoro Jun 21 '20 at 19:18
  • There shouldn't be any sensitive info in git, I removed all of the usernames and passwords etc. The only problem is it wont actually run but it shouldn't really need to as its an example for you guys to see so my question makes sense lol – Neil Jun 21 '20 at 21:51

1 Answers1

1

I think that you're misunderstanding the way that Python modules work. The way I would approach this would be to implement the script with something like the following structure:

base-dir/
  setup.py <-- with entrypoints and dependencies defined
  README.md
  other_files.txt
  main_module/
    __init__.py <-- this marks it as a module
    scrape.py <-- shared code
    dev_specific.py
    prod_specific.py

This way, if you set up a virtual environment and use an entrypoint script, you'll be in the context that you specify and can import and refer to the code as normal. You can approach how to separate the env-specific code in multiple ways; the simplest is to use a conditional gate, something like this:

if args.env == "production":
    prod_specific.perform_login()

The code defined in a different module will not have access to code or variables in the calling module, though; you'll have to pass any relevant information as parameters to the function. Don't think about it as included code being "pulled into" another module; think about it as a reference to that code being made available to the code in that module.

asthasr
  • 9,125
  • 1
  • 29
  • 43
  • Thanks for your answer. I read that link you added but to be honest, I got lost in the docs. Is the setup.py generated from somewhere or do you code it yourself? I updated the question with a github repo so maybe that will give you an idea of where I'm at :) – Neil Jun 21 '20 at 20:00
  • Typically you write setup.py yourself. Here's a well-documented [example](https://github.com/pypa/sampleproject/blob/master/setup.py). – asthasr Jun 21 '20 at 21:08
  • Thanks for the help, asthasr. Now I know what to start learning, I can go find tutorials. – Neil Jun 23 '20 at 05:59