1

I am new to CodeQL and have started learning about dataflow queries for C/C++ programs. Following is a excerpt of a C program that I want to analyse:

int main(int argc, char * argv[])
{
    unsigned short size, x, y;
    int r1, r2;
    x = atoi(argv[1]);// one dim of the data
    y = atoi(argv[2]);//other dim of the data
    size = x*y; //total size of the data
    r1 = MyVuln(size*sizeof(char));

    r2 = MyVuln(x*sizeof(char));
...
// some code
...

return 0
}

In the above example, I want to capture if MyVuln function is called with size as argument. The size is defined as a result of AssignExpr such that its Rvalue is a result of multiplication. Following is the COdeQL queries that I wrote:

/*
@kind path-problem
*/

import cpp
import semmle.code.cpp.dataflow.new.DataFlow
//import DataFlow::PathGraph

from Function myvuln, FunctionCall fc, AssignExpr ab 
where
myvuln.hasGlobalName("MyVuln")
and fc.getTarget() = myvuln
and ab.getLValue().getType().getUnspecifiedType() instanceof IntegralType
and ab.getRValue() instanceof MulExpr
and exists (DataFlow::Node src, DataFlow::Node sink| 
src.asExpr() = ab.getLValue()
and sink.asExpr() = fc.getArgument(0)
and DataFlow::localFlow(src, sink)
)
select fc, "MyVuln with Arithmetic arg at " + fc.getLocation().toString()

The query returns no result (I am using CodeQl with VS Code). I also checked if a smaller partial query can detect expression corresponding to size definition and it is working. I also checked if the query finds calls to MyVuln and it is working. Only when I start writing dataflow path query, I am getting no result. This type of query seems pretty straight forward, but I am not getting any clue where I have gone wrong or what is that I am missing in this query. A help is highly appreciates. thanks

Sanjay
  • 95
  • 2
  • 14
  • Remove `and` conditions one at a time until the query returns a result. Then you'll know which one is causing the problem. – Robert Harvey Apr 01 '23 at 16:59
  • 1
    1) There might not be any flow from `getLValue()` (at least for [Java queries there isn't](https://github.com/github/codeql/issues/5652)); 2) `MyVuln` is not directly called with `size`, it is called with `size * ...`, so that is only [tainted but not data flow](https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/#normal-data-flow-vs-taint-tracking); 3) Why require that there is an `AssignExpr`? Why not check for data flow (respectively taint flow) with the multiplication as source? – Marcono1234 Apr 02 '23 at 15:32
  • thanks @RobertHarvey. I tried with a simpler query that is given in codeql documentation (https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-cpp-new/). It also didn't work. I am going to try more in step by step mode as you suggested. – Sanjay Apr 02 '23 at 16:27
  • thanks @Marcono1234. I will try with your suggestions and get back to you. `AssignExpr` is just to get to the `size` variable so that I can start dataflow from there. also as per this link (https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/), dataflow should be there from `size` to `size*sizeof(char)` expression, no? – Sanjay Apr 02 '23 at 16:30
  • 1
    There is data flow only to the RValue for `size` in the expression `size * sizeof(char)`, but I don't think there is data flow to the whole `size * sizeof(char)` expression (i.e. the result of the multiplication), that most likely is only taint flow. – Marcono1234 Apr 02 '23 at 16:39
  • Hi @Sanjay Can you confirm whether you are using [google cloud Dataflow](https://cloud.google.com/dataflow/docs) or not, since your question includes the google-cloud-Dataflow tag? – kiran mathew Apr 03 '23 at 08:11
  • 1
    Hi @kiranmathew, no, I am not using it. I remember adding only "dataflow" tag. I will delete that tag. – Sanjay Apr 03 '23 at 16:05
  • Hi @Marcono1234, so based on your comment, I did two changes: used taintflow, instead of dataflow and used `RValue`, instead of `LValue`. with these changes, I could get results. thank you very much. However, I also did more. I could extract the `size` variable as subexpression from the argument and then did a **dataflow** and it worked too, but with `RValue` as `source` (not the LValue!). I am still bit puzzled about this behavior and will try to investigate a bit more on this. Nevertheless, thanks for the pointers you provided. – Sanjay Apr 03 '23 at 18:39
  • @Sanjay, no problem. Might be good if you could create an answer for your question here and describe your solution, so that hopefully it will help others as well. – Marcono1234 Apr 04 '23 at 21:21

1 Answers1

0

So, based on the suggestions from @Marcono1234, following is the query that worked for my problem mentioned in the question above.

/*
@kind path-problem
*/

import cpp
import semmle.code.cpp.dataflow.new.DataFlow
import semmle.code.cpp.dataflow.new.TaintTracking
//import DataFlow::PathGraph

from Function myvuln, FunctionCall fc, AssignExpr ab, Expr p, DataFlow::Node src, DataFlow::Node sink
where
// getting the call that I am interested in as sink
myvuln.hasGlobalName("MyVuln")
and fc.getTarget() = myvuln
// getting the "interesting" parameter that will flow into the parameter of MyVuln
and ab.getLValue().getType().getUnspecifiedType() instanceof IntegralType
and ab.getRValue() instanceof MulExpr
and src.asExpr() = ab.getRValue() // this was problematic as in my earlier query, I was extracting LValue. But it turns out that I need to select the expression that will compute the value that will flow into the parameter of MyVuln. thus the RValue expression
and sink.asExpr() = fc.getArgument(0)
and TaintTracking::localTaint(src, sink)

select fc, sink.toString(), "MyVul with Arithmetic operation at " + fc.getLocation().toString()

As I am learning CodeQL, I also wanted to understand various ways of doing the stuff. So, I explored the same problem as dataflow by extracting the size var from the expression size*sizeof(char) by using `getAChild*(). this will require changes in the above queries at two place as follows:

and sink.asExpr() = fc.getArgument(0).getAChild*()
//and TaintTracking::localTaint(src, sink)
and DataFlow::localFlow(src, sink)
Sanjay
  • 95
  • 2
  • 14