Forgive the fact that I'm answering, only to use a more appropriate format than comments, though this may not be the answer you're looking for at this point.
Unless I have misunderstood your question, I think the problem you describe here is an ill-defined / ill-described problem. My understanding of it is that you have a dataset of X and Y, which happen to follow a linear relationship, and you want to either 'infer' that X causes Y in the absence of any other information, or simply have a way to describe this is the case via a predicate. The problem is that, a correlated dataset can never give you that information by itself.
I you want to establish causality from a dataset, you need to describe what type of causality you're after and how that could be asserted and investigated first. Having a dataset that can never tell you nothing about causality if you don't know the ordering of events, or how alternatives behave.
I'm sure there are many models of causality out there, I have only come across two used meaningfully in practice: the chronological model, and the counterfactual model.
In the chronological model, if you are able to establish 'when' an event happens, then you can infer causality via a very simple "and X comes before Y" rule. E.g. if "X = travel" is deemed to take place before "Y = fuel-measurement", then you can establish causality using predicate logic, by showing that:
- Whenever travel precedes fuel-measurement, the relationship is always necessarily linear
- When fuel-measurement precedes travel, the relationship is not necessarily linear. (because if it were, then you're back to only being able to establish correlation rather than causality)
- The closed world phenomenon applies (i.e. there is nothing else that contributes to fuel consumption in the absence of travel)
In the counterfactual model, you don't have any information about the chronology of the events, but what you do have is information on alternative events. Therefore causality of "X causes Y" is established by it's counterfactual, i.e. if you can show that "Had X not happened, Y would not have happened either" (or equivalently ¬X implies ¬Y).
A complicating factor in the counterfactual model is that it allows for the concept of 'responsibility', i.e. if both X and ¬X can result in Y, then they are both said to be potential causes for Y. However in the context of a dataset you can probably get around this by saying "if for ALL events X, the outcome is Y, whereas it is not necessarily true that for ALL events ¬X the outcome is Y, then we can infer that X causes Y". So, in your specific example, you could set up a world such that
- Fuel consumption can either only occur from a 'travel' event or an alternative hypothesis which constitutes the non-travel event and is a mutually exclusive event, e.g. say, 'siphoning'
- Both the travel 'event' and the siphoning 'event' result in a physical measurement, e.g. distance traveled. (which, in our trivial example, would probably just be zero for the siphoning event).
- In your dataset you have information on 'both' what event occurred (e.g. travel or siphoning) and information on fuel consumption and distance travelled for that instance.
You can then establish that 'travelling' as an event 'causes' fuel consumption in a linear model fashion with respect to the distance traveled, by showing that:
- Whenever you have a 'travel' event, the distance traveled does indeed correspond to fuel consumption according to your linear model
- Whenever you have a 'siphoning' event, the distance traveled does not 'necessarily' correspond to fuel consumption according to that model.
Update to address the comment: the question is not one of inferring causality, but how to represent causality under the assumption that causality has already been established in practice. In this case, the above points still apply, since you need to define more clearly which type of causality you are referring to before you can represent it.
For example, if we are talking about events that occur in strict chronological order, chronological causality might look something like this (in prolog-like pseudocode):
%%%%%%%%%%%%%%%%%%
%%% facts database
%%%%%%%%%%%%%%%%%%
% eventtype/1: defines type of event
eventtype('travel')
eventtype('fuel_measurement') % ... etc
% eventtime/2: defines timepoints by index and a record of actual time
eventtime(1, "12:02am")
eventtime(2, "12:03am") % ... etc
% event/3: ['event type', 'time', 'related measurement']
event( [eventtype('travel'), eventtime(1, _), 50km] )
event( [eventtype('fuel-measurement'), eventtime(2, _), 5L ] ) % ... etc
%%%%%%%%%%%%%
%%% relations
%%%%%%%%%%%%%
immediately_precedes( event(X), event(Y) ) :-
get_eventtime_index(X, Xind),
get_eventtime_index(Y, Yind),
plus_one(Xind, Yind). % assumes all above helper predicates are suitably defined elsewhere
is_linearly_related( event(X), event(Y) ) :-
get_measurement(X, Xmeas),
get_measurement(Y, Ymeas),
Model is a + b * Xmeas,
Ymeas = Model.
iscausal( eventtype(Xtype), eventtype(Ytype) ) :- % expressed as pseudocode
forall:
[event(X), event(Y)],
X = [Xtype, Xtime, Xmeas],
Y = [Ytype, Ytime, Ymeas],
immediately_precedes( event(X), event(Y) )
it applies that:
is_linearly_related( event(X), event(Y) )