1

I am quite new to causal inference and want to try some methods for treatment effect estimation. For this purpose, I created a the following data generation process in Python:

import numpy as np

n = 10000
X3 = np.random.randint(1,4, n)
X2 = np.random.randint(1,11, n)

X1 = 5 * X2 + 3 * X3 + np.random.randint(-1,3, n)
X4 = 10 * X2 + np.random.randint(-2,5, n)

#treatment probability
propensity = np.where(X1 > 30, 0.8, 0.2)
T = np.random.binomial(1, propensity)

#treatment effect
tau = np.where(X2 > 5, 10,0) * T * (-1)

#define outcome
Y = 50 * X2 - 5 * np.sqrt(X1) + T*tau + np.random.randint(10,21, n)

For the generated data, would created the following graph:

DAG

My question is: Since X2 (according to the data generation) does not influence the assignment of T but does have an influence on the treatment effect itself, is there an edge between X2 and T (X2->T) required?

Thank you very much!

ehudk
  • 510
  • 5
  • 13
terra_cau
  • 11
  • 1

2 Answers2

1

The sad truth is that, in general, DAGs can only tell you what variables affect the other variables, but they can't tell you how they do that - they don't describe the functional form.

There are several structural forms in which effect modification can arise, but the one in your data generation simulation matches the one you already drew in your DAG (X2 affects Y but not T).

If it's important for you to convey the interactions, there are proposed DAG-based ways, but they are no longer valid DAGs.

ehudk
  • 510
  • 5
  • 13
0

Just because you do not see a direct connection between X2 and T it does not mean that it has no influence in the assignment of T. As you set the rules of the data generation you know for a fact that X2 has some effect on T through X1, which is clear in the DAG.