Turning Your Data Into a CausalTable
In Julia, most datasets are stored in a Table: a data structure with a Tables.jl-compatible interface. One of the main purposes of CausalTables.jl is to wrap a Table of data in Julia in order to provide it as input to some other causal inference package. Given a Table of some data, we can turn it into a CausalTable
by specifying the treatment, response, and control variables.
Tables with Causally Independent Units
The code below provides an example of how to wrap the Boston Housing dataset as a CausalTable
to answer causal questions of the form "How would changing nitrous oxide air pollution (NOX
) within Boston-area towns affect median home value (MEDV
)?" Any dataset in a Tables.jl-compliant format can be wrapped as a CausalTable
. In this example, we turn a DataFrame
from DataFrames.jl into a CausalTable
object.
using CausalTables
using MLDatasets: BostonHousing
using DataFrames
# get data in a Tables.jl-compliant format
tbl = BostonHousing().dataframe
# Wrapping the dataset in a CausalTable
ctbl = CausalTable(tbl; treatment = :NOX, response = :MEDV, confounders = [:CRIM, :ZN, :INDUS, :CHAS, :B, :DIS, :LSTAT])
After wrapping a dataset in a CausalTable
object, the Tables.jl is available to call on the CausalTable
as well. Below, we demonstrate a few of these functions, as well as additional utility functions for causal inference tasks made available by CausalTables.jl.
using Tables
# Examples of using the Tables.jl interface
Tables.getcolumn(ctbl, :NOX) # extract specific column
Tables.subset(ctbl, 1:5) # exact specific rows
Tables.columnnames(ctbl) # obtain all column names
# Additional utility functions for CausalTables
treatment(ctbl) # get CausalTable of treatment variables
response(ctbl) # get CausalTable of response variables
confounders(ctbl) # get CausalTable of confounders
responseparents(ctbl) # get CausalTable of treatment and confounders
data(ctbl) # get underlying wrapped dataset
# replace one or more attributes of the CausalTable
CausalTables.replace(ctbl; response = :CRIM, confounders = [:MEDV, :ZN, :INDUS, :CHAS, :B, :DIS, :LSTAT])
Tables with Network-Dependent Units
The previous example assumes that each unit (row in the Table, in this case tbl
), is "causally independent" of every other unit – that is, the treatment of one unit does not affect the response of any other unit. This is a component of the "stable unit treatment value assumption" (SUTVA) often used in causal inference. In some cases, however, we might work with data in which units may not be causally independent, but rather, in which one unit's variables depend on some summary function of its neighbors
In this case, one must instead perform causal inference on the summary functions of each unit's neighbors (Aronow and Samii, 2017). To do this, each CausalTable
has two relevant arguments that can be used to correct SUTVA violations. The arrays
argument is a NamedTuple
that can store adjacency matrices and other miscellaneous parameters that denote the causal relationships between variables. The summaries
argument is a tuple of NetworkSummary
objects that can be used to summarize the network relationships between units by referencing variables in either the underlying data or the arrays
argument of CausalTable
(or both).
The code below provides an example of how such a CausalTable
might be constructed to consider a summary function treatment in the case of causally-dependent units, using the Karate Club dataset. In this example, treatment is defined as the number of friends a club member has, denoted by the summary function parameter summaries = (friends = Friends(:F),)
. Hence, this answers the causal question "how would changing a subject's number of friends (friends
) affect which club they are likely to join (labels_clubs
)?"
We store the network relationships between units as an adjacency matrix F
by assigning it to the arrays
parameters. This allows the Friends(:F)
summary function to access it when calling summarize(ctbl)
. More detail on the types of NetworkSummary
that can be used in a dependent-data CausalTable
can be found in Network Summaries
using CausalTables
using MLDatasets
using Graphs
# Get a Table of Karate Club data from MLDatasets
data = KarateClub()
tbl = data.graphs[1].node_data
# Convert the karate club data into a Graphs.jl graph object
g = SimpleGraphFromIterator([Edge(x...) for x in zip(data.graphs[1].edge_index...)])
# Store the "friends" as an the adjacency matrix in a NamedTuple
# Note that the input to arrays must be a NamedTuple, even if there is only one summary variable,
# so the trailing comma is necessary.
m = (F = Graphs.adjacency_matrix(g),)
# Construct a CausalTable with the adjacency matrix stored in `arrays` and a summary variable recording the number of friends
ctbl = CausalTable(tbl; treatment = :friends, response = :labels_clubs, arrays = m, summaries = (friends = Friends(:F),))
One can then call the function summarize(ctbl)
to compute the values of the summary function on the causal table.
Based on these summaries, it is also possible to extract two matrices from the CausalTable
object: the adjacency_matrix
and the dependency_matrix
. The adjacency_matrix
denotes which units are causally dependent upon one another: an entry of 1 in cell $(i,j)$ indicates that some variable in unit i exhibits a causal relationship to some variable in unit j. The dependency_matrix
denotes which units are statistically dependent upon one another: an entry of 1 in cell $(i,j)$ indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common neighbor in the adjacency matrix.
CausalTables.adjacency_matrix(ctbl) # get adjacency matrix
CausalTables.dependency_matrix(ctbl) # get dependency matrix
API
CausalTables.adjacency_matrix
— Methodadjacency_matrix(o::CausalTable)
Generate the adjacency matrix induced by the summaries
and arrays
attributes of a CausalTable
object. This matrix denotes which units are causally dependent upon one another: an entry of 1 in cell (i,j) indicates that some variable in unit i exhibits a causal relationship to some variable in unit j.
Arguments
o::CausalTable
: TheCausalTable
object for which the adjacency matrix is to be generated.
Returns
A boolean matrix representing the adjacency relationships in the CausalTable
.
CausalTables.confounders
— Methodconfounders(o::CausalTable)
Selects and returns the confounders from a CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to select confounders.
Returns
A new CausalTable
containing only the confounders.
CausalTables.confoundersmatrix
— Methodconfoundersmatrix(o::CausalTable)
Outputs the confounders from the given CausalTable
object as a matrix.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the confounders.
Returns
A matrix containing only the confounders.
CausalTables.data
— Methoddata(o::CausalTable)
Retrieve the data stored in a CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
from which to retrieve the data.
Returns
The data stored in the CausalTable
object.
CausalTables.dependency_matrix
— Methoddependency_matrix(o::CausalTable)
Generate the dependency matrix induced by the summaries
and arrays
attributes of a CausalTable
object. This matrix stores which units are statistically dependent upon one another: an entry of 1 in cell (i,j) indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common cause (share a neighbor in the adjacency matrix).
Arguments
o::CausalTable
: TheCausalTable
object for which the dependency matrix is to be generated.
Returns
A boolean matrix representing the relationships in the CausalTable
.
CausalTables.getscm
— Methodgetscm(o::CausalTable)
Get the structural causal model (SCM) of a CausalTable
object.
This function merges the column table of the CausalTable
object with its arrays.
Arguments
o::CausalTable
: TheCausalTable
object.
Returns
- A merged table containing the column table and arrays of the
CausalTable
object.
CausalTables.parents
— Methodparents(o::CausalTable, symbol)
Selects the variables that precede symbol
causally from the CausalTable o
. For instance, if symbol
is in o.response
, this function will return a CausalTable containing the symbols in o.treatment
and o.confounders
.
Warning: If symbol
is in o.confounders
, then this function will return a CausalTable containing an empty data
attribute.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the parent variables ofsymbol
.symbol
: The variable for which to extract the parent variables.
Returns
A new CausalTable
containing only the parents of symbol
CausalTables.reject
— Methodreject(o::CausalTable, symbols)
Removes the columns specified by symbols
from the CausalTable
object o
.
Arguments
o::CausalTable
: TheCausalTable
object from which symbols will be rejected.symbols
: A collection of symbols to be rejected from theCausalTable
.
Returns
A new CausalTable
object with the specified symbols removed from its data.
CausalTables.replace
— Methodreplace(o::CausalTable; kwargs...)
Replace the fields of a CausalTable
object with the provided keyword arguments.
Arguments
o::CausalTable
: TheCausalTable
object to be replaced.kwargs...
: Keyword arguments specifying the new values for the fields.
Returns
A new CausalTable
object with the specified fields replaced.
CausalTables.response
— Methodresponse(o::CausalTable)
Selects the response column(s) from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the response column(s).
Returns
A new CausalTable
containing only the response column(s).
CausalTables.responsematrix
— Methodresponsematrix(o::CausalTable)
Outputs the response column(s) from the given CausalTable
object as a matrix.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the response column(s).
Returns
A matrix containing only the response column(s)
CausalTables.responseparents
— Methodresponseparents(o::CausalTable)
Selects all variables besides those in o.response
from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the parent variables of the response.
Returns
A new CausalTable
containing only the confounders and treatment.
CausalTables.select
— Methodselect(o::CausalTable, symbols)
Selects specified columns from a CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which columns are to be selected.symbols
: A list of symbols representing the columns to be selected.
Returns
- A new
CausalTable
object with only the selected columns.
CausalTables.treatment
— Methodtreatment(o::CausalTable)
Selects the treatment column(s) from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the treatment column(s).
Returns
A new CausalTable
containing only the treatment column(s)
CausalTables.treatmentmatrix
— Methodtreatmentmatrix(o::CausalTable)
Outputs the treatment column(s) from the given CausalTable
object as a matrix.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the treatment column(s).
Returns
A matrix containing only the treatment column(s)
CausalTables.treatmentparents
— Methodtreatmentparents(o::CausalTable)
Selects all variables besides those in o.treatment
and o.response
from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the parent variables of the treatment.
Returns
A new CausalTable
containing only the confounders.