CausalTables.jl
A package for storing and simulating data for causal inference in Julia.
Installation
CausalTables.jl can be installed using the Julia package manager. From the Julia REPL, type ]
to enter the Pkg REPL mode and run
Pkg> add CausalTables
Quick Start
CausalTables.jl has three main functionalities:
- Generating simulation data using a
StructuralCausalModel
. - Computing "ground truth" conditional distributions, moments, counterfactuals, and counterfactual functionals from a
StructuralCausalModel
and aCausalTable
. These include, for instance, counterfactual means and average treatment effects. - Wrapping an existing Table as a
CausalTable
object for use by external packages.
The examples below illustrate each of these three functionalities.
Simulating Data from a DataGeneratingProcess
To set up a statistical simulation using CausalTables.jl, we first define a StructuralCausalModel
(SCM). This consists of two parts: a DataGeneratingProcess
(DGP) that controls how the data is generated, and a list of variables to define the basic structure of the underlying causal diagram.
A DataGeneratingProcess can be constructed using the @dgp
macro, which takes a sequence of conditional distributions of the form [variable name] ~ Distribution(args...)
and returns a DataGeneratingProcess
object. Then, one can construct an StructuralCausalModel by passing the DGP to its construct, along with labels of the treatment and response variables. Note that using Distributions
is almost always required before defining a DGP, since the package Distributions.jl is used to define the conditional distribution of random components at each step.
using CausalTables
using Random
using Distributions
dgp = @dgp(
W ~ DiscreteUniform(1, 5),
X ~ (@. Normal(W, 1)),
Y ~ (@. Normal(X + 0.2 * W, 1))
)
scm = StructuralCausalModel(
dgp;
treatment = :X,
response = :Y,
confounders = [:W]
)
One we've defined our list of distribution functions, we can generate data from the DGP using the rand
function:
Random.seed!(1);
data = rand(scm, 5)
We can also apply various causal interventions to the data using the intervene
function. The example below computes a new version of the CausalTable with each unit's treatment shifted by 1 – this is analogous to the effect estimated by a classical linear regression analysis.
data_intervened = intervene(data, additive_mtp(1))
For a more detailed guide of how to generate data please refer to Generating Data.
Computing Ground Truth Functionals
Once we've defined a DGP, we can approximate ground truth statistical functionals along with their efficiency bounds (variance of the counterfactual outcome) for a specified SCM using built-in functions. In general, these include
- Counterfactual Means (
cfmean
) - Counterfactual Differences (
cfdiff
) - Average Treatment Effect (
ate
), including among the Treated (att
) and Untreated (atu
) - Average Policy Effect (
ape
), also known as the causal effect of a Modified Treatment Policy.
For the complete API of available ground truth causal estimands, see Estimands
cfmean(scm, additive_mtp(1))
# output
(μ = 4.599337273915866, eff_bound = 4.881412474779794)
For problems that involving functionals not available through CausalTables.jl or that require more fine-grained knowledge of the true conditional distributions for a given dataset, this package also implements the condensity
function. This function computes the true conditional distributions of any variable in a CausalTable (given a corresponding DGP). The function returns a vector of Distribution objects from the package Distributions.jl
X_distribution = condensity(scm, data, :X)
# output
5-element Vector{Normal{Float64}}:
Distributions.Normal{Float64}(μ=1.0, σ=1.0)
Distributions.Normal{Float64}(μ=2.0, σ=1.0)
Distributions.Normal{Float64}(μ=4.0, σ=1.0)
Distributions.Normal{Float64}(μ=4.0, σ=1.0)
Distributions.Normal{Float64}(μ=5.0, σ=1.0)
For convenience, there also exists conmean
and convar
functions that extracts the true conditional mean and variance of a specific variable the CausalTable. One can apply this to an "intervened" version of data to obtain the conditional mean of the outcome under intervention.
Y_var = convar(scm, data_intervened, :Y)
Y_mean = conmean(scm, data_intervened, :Y)
# output
5-element Vector{Float64}:
2.467564418628885
5.149933692528245
4.973979208080702
4.757247582108903
6.670866154143596
For a more detailed guide of how to compute ground truth conditional distributions please refer to Computing Ground Truth Conditional Distributions.
Wrapping an existing Table as a CausalTable
If you have a table of data that you would like to use with CausalTables.jl without defining a corresponding DataGeneratingProcess (i.e. to use with another package) you can wrap it as a CausalTable
using the corresponding constructor.
tbl = (W = rand(1:5, 10), X = randn(10), Y = randn(10))
ctbl = CausalTable(tbl; treatment = :X, response = :Y, confounders = [:W])
For a more detailed guide of how to wrap an existing table as a CausalTable please refer to Turning Your Data Into a CausalTable.
Contributing
Have questions? Spot a bug or issue in the documentation? Want to request a new feature or add one yourself? Please do not hesitate to open an issue or pull request on the CausalTables.jl GitHub repository. We welcome all contributions and feedback!