My Panda Dataframe looks like this:
id rule impact tags description examples
0 1 \(\)\s*\{.*?;\s*\}\s*; 9 [rce, bash] Shellshock (CVE-2014-6271) [env x='() { :;}; echo vulnerable' bash -c "ec...
1 2 \(\)\s*\{.*?\(.*?\).*?=>.*?\\' 9 [rce, bash] Shellshock (CVE-2014-7169) [env X='() { (a)=>\' bash -c "echo date"; cat ...
2 3 \{\{.*?\}\} 4 [rce, id] Flask curly syntax [{{foo.bar}}]
3 4 \bfind_in_set\b.*?\(.+?,.+?\) 6 [sqli, mysql] Common MySQL function "find_in_set" [SELECT FIND_IN_SET('b','a,b,c,d')]
4 5 ["'].*?> 3 [xss] HTML breaking [">]
What I am interested in is extracting the regular expression for each unique tag. Namely this list:
attack_tags = {'sqlite', 'css', 'spam', 'mongo', 'sqli', 'dos', 'mssql', 'xss', 'mysql', 'php', 'tsql', 'pgsql', 'lfi', 'win', 'id', 'rfi', 'xxe', 'unix', 'bash', 'rce', 'perl', 'ldap'}
I tried the following code but it did not work:
for category in attack_tags:
rules = list(df.query('{} in df[\'tags\']'.format(category))) # select rule from dataframe where current_category (category) is in tags
print(rules) # This should be a list that contains all the rules where the attack category is in df['tags'] column.
I am getting a KeyError: 'current_category' # for instance KeyError: 'mongo' or 'php'
Any recommendation ?