1

I try to make a tool that will save only urls with something like this: page_id?id=1'

http://mechanikrolniczy.cba.pl/viewtopic.php?p=16176'
http://all-tubes-jenna-haze.mzs-dgd.ru/index.php?route=product&product_id=9108429'
https://websetnet.net/page/40/?q=%2Fbs%2Fpage%2F40%2F&loginid=117cee5a78'

in the first URL you can see: .php?p=16176' and in the second url product_id=9108429'

and in every URL it can be different...

so I want to use regex to find only words that start with ?something=numberORwords' I have to make sure it ends with '

I have been trying to do it for the past 2 hours, but for some reason I couldn't do it... I even came up with something like this:

^&/]\Wpage_id\W[=0-9]+|\W?item_id\W[=0-9]+|\W?p\W[=0-9]+\Wview\W[=0-9]+\Wno\W[=0-9]+|\Wimage_id\W[=0-9]+|\Wv\W[=0-9]+|\Wsequence\W[=0-9]+|\Wid\W[=0-9]+|\Wstart\W[=0-9]+[']

My code:

 string pattern = @"([?][\w]+[=][\w]+)[']";
        foreach(string s in urls)
        {
            Match m = Regex.Match(s, pattern);
            if (m.Success)
            {
                Valid.Add(s);
                Console.WriteLine(s);
            }
        }

Edit: what I try to do is to see if a website has ' at after the parameter it can be vulnerable for SQL injection..

Yuri
  • 31
  • 3

1 Answers1

1

This pattern covers all the cases that exist in your examples above.

([?][\w]+[=][\w]+)?([&][\w]+[=][\w]+)*[']

We are looking for the following conditions:

  • ?someWord=numbersOrletters (first capture group)
  • &someWord=numbersOrletters (second capture group)
  • ending with a ' character (final clause)

If your strings are trimmed and cleaned, you could add a $ to the end of the pattern to guarantee that the ' comes at the very end of the string.

I use regexr.com to build and test these things, it's a really helpful tool.

edit: This pattern captures the substring route=product in the second URL you posted. If you'd like to avoid this, you change the pattern to search for strings of digits \d rather than words \w (words are also allowed to contain digits, but digits cannot contain alphabetic characters).

jdavison
  • 322
  • 1
  • 9
  • This is a solid answer and should be selected as the correct answer. – AlexanderGriffin Sep 13 '18 at 19:53
  • oh, I used regex101 and I can't get for some reason any urls now, I will edit the question and show my code – Yuri Sep 13 '18 at 19:53
  • Seems like you didn't include the whole expression in your code, also the last url you added doesn't end with a `'` character, if you'd like to make that condition optional, add a ? to the end of the pattern. – jdavison Sep 13 '18 at 19:59
  • I messed up the whole question, what I tried to do is to check if the website has ' after the parameter it can be vulnerable for SQL injection, but I think I need to send web request or something and not check URLs from text files – Yuri Sep 13 '18 at 20:01
  • This is not the right website to be posting on for advice on how to target servers for SQL injection. – jdavison Sep 13 '18 at 20:07