0

I'm a newbie coder and I need some help with the xxx.find usages...

Summary

I am doing a web automation project with selenium. This python project's main goal is

Step 1. To login into my school's LMS website automatically with provided username and password

Step 2. Get the page_source of a given website URL which my school hosts all the assignment PDF files

Step 3. Do a matching search of PDF viewer links on the page_source string

Step 4. Navigate to the matching PDF viewer URL and get the page_source again

Step 5. Search for var DEFAULT_URL = '/icom/files/b729266c557f5f7108894ade1668d55a.pdf' or relevant links which is the link that my school server hosts the pdf file

Step 6. Get the value /icom/files/xxxxxx.pdf and combine with a header like www.icom.org.cn/ so they become www.icom.org.cn/icom/files/xxxx.pdf

Step 7. Then perform a wget command or something to get the pdf file.

The problem now

I somehow got the login automation correctly but now stuck at the page_source matching process but somehow stuck at Step 3

After getting onto the website URL where my school hosts all PDF assignment.

I ran a sauce = driver.pages_source and I am trying to match it by

word = ("/icom/faculty/viewer/?id")
print(sauce.find(word))

But the output of the find is 5122

I would like that the str.find usage to match and list up all matching result and output the values in lines but not numbers...

So ... how can I do that? I searched up the internet for this but still isn't too informative to me. Sorry that I'm just a beginner in programming and I don't speak English too well. Thank you in advance.

By the way, a problem popped out in my mind.

If I was able to match up the value in Step 3. How can I code it to include extra value inside of the PDF viewer link?

Because the full link will be like this:

/icom/faculty/viewer/?id=1260&type=2 For English assignment

/icom/faculty/viewer/?id=1254&type=2 For Mathematics assignment

/icom/faculty/viewer/?id=775&type=2 For assignment covers etc

The extra value id=xxxx always will be different. So... how can I code it to match up and list out all matching result + extra value?

Any help would be appreciated! Thank you very much!

Exact example of the page_source

  </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=FP102-001&varSemester_code=1/20&prog_id=FIM')">FP102-001</a></td>
        <td valign="top" width="150">MUSIC PERFORMANCE LAB </td>
        <td valign="top" width="100" align="center">IBS<br>(<a href="mailto:abcder@icom.edu.my">fafafan@icom.edu.my</a>)        </td>
        <td valign="top" width="100" align="center">3 </td>
                <td valign="top" width="30" align="center">THU<br>      </td>
        <td valign="top" width="100" align="center">1:00PM<br>      </td>
        <td valign="top" width="100" align="center">3:30PM<br>      </td>
        <td valign="top" width="100" align="center">E2<br></td>
                <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=1126&type=1')">FP102-FIM CS.pdf</a><br /></td>
                  <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=470&type=2')">FP102-Assignment 1 Brief.pdf</a><br />II <a href="#" onclick="return popup('/icom/faculty/viewer/?id=471&type=2')">FP102-Assignment 2 Brief.pdf</a><br />III <a href="#" onclick="return popup('/icom/faculty/viewer/?id=788&type=2')">FP102-Peer Assessment Sheet.pdf</a><br /></td>  
      </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=GE030-001&varSemester_code=1/20&prog_id=FIM')">GE030-001</a></td>
        <td valign="top" width="150">ELECTRONICS & COMPUTER SYSTEMS </td>
        <td valign="top" width="100" align="center">ZHZ<br>(<a href="mailto:fafafer@gmail.com">abcdef@gmail.com</a>)        </td>
        <td valign="top" width="100" align="center">3 </td>
                <td valign="top" width="30" align="center">MON<br>WED<br>       </td>
        <td valign="top" width="100" align="center">7:30PM<br>      </td>
        <td valign="top" width="100" align="center">8:45PM<br>      </td>
        <td valign="top" width="100" align="center">C2<br></td>
                <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=1203&type=1')">GE030-FIM CS.pdf</a><br /></td>
                  <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=474&type=2')">GE030-Assignment Brief.pdf</a><br />II <a href="#" onclick="return popup('/icom/faculty/viewer/?id=684&type=2')">GE030-Oral Presentation Brief.pdf</a><br /></td>  
      </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=HM010-001&varSemester_code=1/20&prog_id=FIM')">HM010-001</a></td>
        <td valign="top" width="150">SURVEY OF POP MUSIC </td>
        <td valign="top" width="100" align="center">IBS<br>(<a href="mailto:abcdef@icom.edu.my">abcder@icom.edu.my</a>)     </td>
        <td valign="top" width="100" align="center">2 </td>
                <td valign="top" width="30" align="center">FRI<br>      </td>
        <td valign="top" width="100" align="center">11:00AM<br>     </td>
        <td valign="top" width="100" align="center">12:40PM<br>     </td>
        <td valign="top" width="100" align="center">RH<br></td>
                <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=1220&type=1')">HM010-FIM CS.pdf</a><br /></td>
                  <td valign="top" align="left"></td>  
      </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=MT010-001&varSemester_code=1/20&prog_id=FIM')">MT010-001</a></td>
        <td valign="top" width="150">MUSIC TECHNOLOGY & MIDI SYSTEMS </td>
        <td valign="top" width="100" align="center">SKS<br>(<a href="mailto:faer@icom.edu.my">sdferfs@icom.edu.my</a>)      </td>
        <td valign="top" width="100" align="center">2 </td>
                <td valign="top" width="30" align="center">TUE<br>THU<br>       </td>
        <td valign="top" width="100" align="center">12:00PM<br>     </td>
        <td valign="top" width="100" align="center">12:50PM<br>     </td>
        <td valign="top" width="100" align="center">C3<br></td>
                <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=1131&type=1')">MT010-FIM CS.pdf</a><br /></td>
                  <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=776&type=2')">MT010-Project Brief.pdf</a><br />II <a href="#" onclick="return popup('/icom/faculty/viewer/?id=775&type=2')">MT010-Assignment Brief.pdf</a><br /></td>  
      </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=PF011-001&varSemester_code=1/20&prog_id=FIM')">PF011-001</a></td>
        <td valign="top" width="150">PERFORMANCE SEMINAR </td>
        <td valign="top" width="100" align="center">IBS<br>(<a href="mailto:fafafa@icom.edu.my">fafafaf@icom.edu.my</a>)        </td>
        <td valign="top" width="100" align="center">0 </td>
                <td valign="top" width="30" align="center"><br>     </td>
        <td valign="top" width="100" align="center">BY APPT<br>     </td>
        <td valign="top" width="100" align="center">BY APPT<br>     </td>
        <td valign="top" width="100" align="center">BY APPT<br></td>
                <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=1208&type=1')">PF011-PF031-FIM CS.pdf</a><br /></td>
                  <td valign="top" align="left"></td>  
      </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=PI041-001&varSemester_code=1/20&prog_id=FIM')">PI041-001</a></td>
        <td valign="top" width="150">VOICE </td>
        <td valign="top" width="100" align="center">IBS<br>(<a href="mailto:fafafa@icom.edu.my">fafan@icom.edu.my</a>)      </td>
        <td valign="top" width="100" align="center">1 </td>
                <td valign="top" width="30" align="center"><br>     </td>
        <td valign="top" width="100" align="center">BY APPT<br>     </td>
        <td valign="top" width="100" align="center">BY APPT<br>     </td>
        <td valign="top" width="100" align="center">BY APPT<br></td>
                <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=1143&type=1')">PI041-FIM CS.pdf</a><br /></td>
                  <td valign="top" align="left">I <a href="#" onclick="return popup('/icom/faculty/viewer/?id=795&type=2')">PI0XX mark sheet.pdf</a><br /></td>  
      </tr>
          <tr>
                <td valign="top" width="100"><a href="javascript:void(0)" onclick="return popup2('/icom/student/main/report?varCrs_ID=PL001-001&varSemester_code=1/20&prog_id=FIM')">PL001-001</a></td>
        <td valign="top" width="150">English Placement </td>
        <td valign="top" width="100" align="center">VNS<br>(<a href="mailto:xxx@gmail.com">abcder@gmail.com</a>)        </td>
        <td valign="top" width="100" align="center">0 </td>
                <td valign="top" width="30" align="center"><br>     </td>
        <td valign="top" width="100" align="center">-<br>       </td>
        <td valign="top" width="100" align="center">-<br>       </td>
        <td valign="top" width="100" align="center">-<br></td>
                <td valign="top" align="left"></td>

2 Answers2

1

If I got your question right, you need some logic for string matching.

This is a possible approach to do so.

You check if a string is in another and proceed with your logic, otherwise you can define some excepts.

link = "/icom/faculty/viewer/?id=1260&type=2"
word = "/icom/faculty/viewer/?id"

if word in link:
   print('word is in link')
else:
   print('word is not in link')

If you refer to string concatenaion I recommend this thread:

Which is the preferred way to concatenate a string in Python?

Additionally If you want to grab strings of multiple strings you can define a simple for-loop.

JSRB
  • 2,492
  • 1
  • 17
  • 48
1

You could try to use regex pattern matching for python

import re
path = re.compile("/icom/faculty/viewer/.[a-z]{2}.*")
lst = path.findall(sauce)
print(lst)

lst will be a list with all "/icom/faculty/viewer/?id" strings containing the mentioned match are followed by any character until a new line is encountered. So...you should get a list of strings which match the string and the unique index=xxx values.

You can iterate over this list and separate further the into smaller lists for each assignment you want.

Cristian Gabor
  • 322
  • 1
  • 3
  • 13