Turning Your Data Into a CausalTable
In Julia, most datasets are stored in a Table: a data structure with a Tables.jl-compatible interface. One of the main purposes of CausalTables.jl is to wrap a Table of data in Julia in order to provide it as input to some other causal inference package. Given a Table of some data, we can turn it into a CausalTable
by specifying the treatment, response, and control variables.
Constructing the CausalTable
The code below provides an example of how to wrap the Boston Housing dataset as a CausalTable
to answer causal questions of the form "How would changing nitrous oxide air pollution (NOX
) within Boston-area towns affect median home value (MEDV
)?" Any dataset in a Tables.jl-compliant format can be wrapped as a CausalTable
. In this example, we turn a DataFrame
from DataFrames.jl into a CausalTable
object.
using CausalTables
using MLDatasets: BostonHousing
using DataFrames
# get data in a Tables.jl-compliant format
tbl = BostonHousing().dataframe
# Wrapping the dataset in a CausalTable
ctbl = CausalTable(tbl; treatment = :NOX, response = :MEDV)
When only treatment
and response
are specified, all other variables are assumed to be confounders. However, one can also explicitly specify the causes of both treatment and response by passing them as a NamedTuple
of lists to the CausalTable
constructor. In the example below, we specify the causes of the treatment NOX
only as [:CRIM, :INDUS]
, and the causes of the response MEDV
are specified as [:CRIM, :INDUS, :NOX]
.
ctbl = CausalTable(tbl; treatment = :NOX, response = :MEDV,
causes = (NOX = [:CRIM, :INDUS], MEDV = [:CRIM, :INDUS, :NOX]))
CausalTable
┌─────────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬───────────
│ CRIM │ ZN │ INDUS │ CHAS │ NOX │ RM │ AGE │ DIS ⋯
│ Float64 │ Float64 │ Float64 │ Int64 │ Float64 │ Float64 │ Float64 │ Float64 ⋯
├─────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼───────────
│ 0.00632 │ 18.0 │ 2.31 │ 0 │ 0.538 │ 6.575 │ 65.2 │ 4.09 ⋯
│ 0.02731 │ 0.0 │ 7.07 │ 0 │ 0.469 │ 6.421 │ 78.9 │ 4.9671 ⋯
│ 0.02729 │ 0.0 │ 7.07 │ 0 │ 0.469 │ 7.185 │ 61.1 │ 4.9671 ⋯
│ 0.03237 │ 0.0 │ 2.18 │ 0 │ 0.458 │ 6.998 │ 45.8 │ 6.0622 ⋯
│ 0.06905 │ 0.0 │ 2.18 │ 0 │ 0.458 │ 7.147 │ 54.2 │ 6.0622 ⋯
│ 0.02985 │ 0.0 │ 2.18 │ 0 │ 0.458 │ 6.43 │ 58.7 │ 6.0622 ⋯
│ 0.08829 │ 12.5 │ 7.87 │ 0 │ 0.524 │ 6.012 │ 66.6 │ 5.5605 ⋯
│ 0.14455 │ 12.5 │ 7.87 │ 0 │ 0.524 │ 6.172 │ 96.1 │ 5.9505 ⋯
│ ⋮ │ ⋮ │ ⋮ │ ⋮ │ ⋮ │ ⋮ │ ⋮ │ ⋮ ⋱
│ 0.23912 │ 0.0 │ 9.69 │ 0 │ 0.585 │ 6.019 │ 65.3 │ 2.4091 ⋯
│ 0.17783 │ 0.0 │ 9.69 │ 0 │ 0.585 │ 5.569 │ 73.5 │ 2.3999 ⋯
│ 0.22438 │ 0.0 │ 9.69 │ 0 │ 0.585 │ 6.027 │ 79.7 │ 2.4982 ⋯
│ 0.06263 │ 0.0 │ 11.93 │ 0 │ 0.573 │ 6.593 │ 69.1 │ 2.4786 ⋯
│ 0.04527 │ 0.0 │ 11.93 │ 0 │ 0.573 │ 6.12 │ 76.7 │ 2.2875 ⋯
│ 0.06076 │ 0.0 │ 11.93 │ 0 │ 0.573 │ 6.976 │ 91.0 │ 2.1675 ⋯
│ 0.10959 │ 0.0 │ 11.93 │ 0 │ 0.573 │ 6.794 │ 89.3 │ 2.3889 ⋯
│ 0.04741 │ 0.0 │ 11.93 │ 0 │ 0.573 │ 6.03 │ 80.8 │ 2.505 ⋯
└─────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴───────────
6 columns and 490 rows omitted
Summaries: NamedTuple()
Arrays: NamedTuple()
Note that a full representation of the causes of each variable is not required, though they can be specified (this is often referred to a "directed acyclic graph"). Only the causes of the treatment and response are necessary as input; CausalTables.jl
can compute other types of variables one might be interested in like confounders or mediators automatically.
When provided, the partial edgelist represented by causes
assumes that if variable A is not listed as a cause of B, then no "causal path" exists between A and B – the two variables are uncorrelated. This differs slightly from the common definition of a directed acyclic graph edge in causal inference, where A can be considered a cause of B even if it only acts through another variable C. In this case, specify both A and C as causes of B in causes
when constructing the CausalTable
.
After wrapping a dataset in a CausalTable
object, the Tables.jl is available to call on the CausalTable
as well. Below, we demonstrate a few of these functions, as well as additional utility functions for causal inference tasks made available by CausalTables.jl.
using Tables
# Examples of using the Tables.jl interface
Tables.getcolumn(ctbl, :NOX) # extract specific column
Tables.subset(ctbl, 1:5) # exact specific rows
Tables.columnnames(ctbl) # obtain all column names
(:CRIM, :ZN, :INDUS, :CHAS, :NOX, :RM, :AGE, :DIS, :RAD, :TAX, :PTRATIO, :B, :LSTAT, :MEDV)
In addition, the CausalTable
object has several utility functions that can be used to extract different types of variables relevant to causal inference from the CausalTable
object.
# Additional utility functions for CausalTables
treatment(ctbl) # get CausalTable of treatment variables
response(ctbl) # get CausalTable of response variables
treatmentparents(ctbl) # get CausalTable of treatment and response
responseparents(ctbl) # get CausalTable of treatment and confounders
parents(ctbl, :NOX) # get CausalTable of parents of a particular variable
confounders(ctbl) # get CausalTable of confounders
mediators(ctbl) # get CausalTable of mediators
instruments(ctbl) # get CausalTable of instruments
data(ctbl) # get underlying wrapped dataset of a CausalTable
Although the CausalTable
object is immutable, one can replace the values of its attributes with new ones using the replace
function. The code below demonstrates how to replace the treatment and response variables of the CausalTable
object ctbl
with :CRIM
and nothing
, respectively. Setting causes = nothing
is a quick shortcut to specify that all unlabeled variables are confounders of the treatment-response relationship.
# Replace one or more attributes of the CausalTable.
# Setting `causes = nothing` is a quick shortcut to specify
# that all unlabeled variables are confounders of the treatment-response relationship
CausalTables.replace(ctbl; response = :CRIM, causes = nothing)
Tables with Network-Dependent Units
The previous example assumes that each unit (row in the Table, in this case tbl
), is "causally independent" of every other unit – that is, the treatment of one unit does not affect the response of any other unit. This is a component of the "stable unit treatment value assumption" (SUTVA) often used in causal inference. In some cases, however, we might work with data in which units may not be causally independent, but rather, in which one unit's variables depend on some summary function of its neighbors
In this case, one must instead perform causal inference on the summary functions of each unit's neighbors (Aronow and Samii, 2017). To do this, each CausalTable
has two relevant arguments that can be used to correct SUTVA violations. The arrays
argument is a NamedTuple
that can store adjacency matrices and other miscellaneous parameters that denote the causal relationships between variables. The summaries
argument is a tuple of NetworkSummary
objects that can be used to summarize the network relationships between units by referencing variables in either the underlying data or the arrays
argument of CausalTable
(or both).
The code below provides an example of how such a CausalTable
might be constructed to consider a summary function treatment in the case of causally-dependent units, using the Karate Club dataset. In this example, treatment is defined as the number of friends a club member has, denoted by the summary function parameter summaries = (friends = Friends(:F),)
. Hence, this answers the causal question "how would changing a subject's number of friends (friends
) affect which club they are likely to join (labels_clubs
)?"
We store the network relationships between units as an adjacency matrix F
by assigning it to the arrays
parameters. This allows the Friends(:F)
summary function to access it when calling summarize(ctbl)
. More detail on the types of NetworkSummary
that can be used in a dependent-data CausalTable
can be found in Network Summaries
using CausalTables
using MLDatasets
using Graphs
# Get a Table of Karate Club data from MLDatasets
data = KarateClub()
tbl = data.graphs[1].node_data
# Convert the karate club data into a Graphs.jl graph object
g = SimpleGraphFromIterator([Edge(x...) for x in zip(data.graphs[1].edge_index...)])
# Store the "friends" as an the adjacency matrix in a NamedTuple
# Note that the input to arrays must be a NamedTuple, even if there is only one summary variable,
# so the trailing comma is necessary.
m = (F = Graphs.adjacency_matrix(g),)
# Construct a CausalTable with the adjacency matrix stored in `arrays` and a summary variable recording the number of friends
ctbl = CausalTable(tbl; treatment = :friends, response = :labels_clubs, arrays = m, summaries = (friends = Friends(:F),))
One can then call the function summarize(ctbl)
to compute the values of the summary function on the causal table.
Based on these summaries, it is also possible to extract two matrices from the CausalTable
object: the adjacency_matrix
and the dependency_matrix
. The adjacency_matrix
denotes which units are causally dependent upon one another: an entry of 1 in cell $(i,j)$ indicates that some variable in unit i exhibits a causal relationship to some variable in unit j. The dependency_matrix
denotes which units are statistically dependent upon one another: an entry of 1 in cell $(i,j)$ indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common neighbor in the adjacency matrix.
CausalTables.adjacency_matrix(ctbl) # get adjacency matrix
CausalTables.dependency_matrix(ctbl) # get dependency matrix
API
Base.replace
— Methodreplace(o::CausalTable; kwargs...)
Replace the fields of a CausalTable
object with the provided keyword arguments.
Arguments
o::CausalTable
: TheCausalTable
object to be replaced.kwargs...
: Keyword arguments specifying the new values for the fields.
Returns
A new CausalTable
object with the specified fields replaced.
CausalTables.adjacency_matrix
— Methodadjacency_matrix(o::CausalTable)
Generate the adjacency matrix induced by the summaries
and arrays
attributes of a CausalTable
object. This matrix denotes which units are causally dependent upon one another: an entry of 1 in cell (i,j) indicates that some variable in unit i exhibits a causal relationship to some variable in unit j.
Arguments
o::CausalTable
: TheCausalTable
object for which the adjacency matrix is to be generated.
Returns
A boolean matrix representing the adjacency relationships in the CausalTable
.
CausalTables.confoundernames
— Methodconfoundernames(o::CausalTable, x::Symbol, y::Symbol)
Outputs the names of the confounders of the causal relationship between x
and y
from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the confounder names.x::Symbol
,y::Symbol
: The two variables whose confounders should be selected.
Returns
A Vector of Symbols containing the names of the confounders between x and y.
CausalTables.confoundernames
— Methodconfoundernames(o::CausalTable)
Outputs the confounder names of each response-treatment pair from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the confounder names of each treatment-response pair.
Returns
A matrix of Vectors containing the confounder names of each treatment-response pair.
CausalTables.confounders
— Methodconfounders(o::CausalTable, x::Symbol, y::Symbol)
Selects the common causes for a specific pair of variables (x,y) from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the confounders.x::Symbol
,y::Symbol
: The two variables whose confounders should be selected.
Returns
A new CausalTable
containing only the confounders of both x and y.
CausalTables.confounders
— Methodconfounders(o::CausalTable; collapse_parents = true)
Selects the confounders of each response-treatment pair from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the confounder variables of each treatment-response pair.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleCausalTable
object if there is either only one treatment-response pair or all pair share the same set of confounders. Defaults totrue
.
Returns
A new CausalTable
containing only the confounders (if a single response, or all responses share the same set of causes); otherwise, a Matrix of CausalTable objects containing the confounders of each treatment-response pair, where rows represent responses and columns represent treatments.
CausalTables.confoundersmatrix
— Methodconfoundersmatrix(o::CausalTable; collapse_parents = true)
Outputs the treatment-variable confounders from the given CausalTable
object as a matrix (or matrix of matrices, if multiple treatment-response pairs are present).
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the confounders of each treatment-response pair.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleMatrix
object if there is either only one treatment-response pair or all pair share the same set of confounders. Defaults totrue
.
Returns
A matrix containing only the confounders.
CausalTables.data
— Methoddata(o::CausalTable)
Retrieve the data stored in a CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
from which to retrieve the data.
Returns
The data stored in the CausalTable
object.
CausalTables.dependency_matrix
— Methoddependency_matrix(o::CausalTable)
Generate the dependency matrix induced by the summaries
and arrays
attributes of a CausalTable
object. This matrix stores which units are statistically dependent upon one another: an entry of 1 in cell (i,j) indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common cause (share a neighbor in the adjacency matrix).
Arguments
o::CausalTable
: TheCausalTable
object for which the dependency matrix is to be generated.
Returns
A boolean matrix representing the relationships in the CausalTable
.
CausalTables.instrumentnames
— Methodinstrumentnames(o::CausalTable, x::Symbol, y::Symbol)
Outputs the names of the instruments of the causal relationship between x
and y
from the given CausalTable
object; that is, variables that are associated with x
but do not cause y
.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the mediator names.x::Symbol
,y::Symbol
: The two variables whose mediators should be selected.
Returns
A Vector of Symbols containing the names of the mediators between x and y.
CausalTables.instrumentnames
— Methodinstrumentnames(o::CausalTable)
Outputs the instrument names of each treatment-response pair from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the instrument names of each treatment-response pair.
Returns
A matrix of Vectors containing the instrument names of each treatment-response pair.
CausalTables.instruments
— Methodinstruments(o::CausalTable, x::Symbol, y::Symbol)
Selects the instruments for a specific pair of variables (x,y) from the given CausalTable
object; that is, variables that are associated with x
but do not cause y
.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the instruments.x::Symbol
,y::Symbol
: The two variables whose instruments should be selected.
Returns
A new CausalTable
containing only the instruments of both x and y.
CausalTables.instruments
— Methodinstruments(o::CausalTable; collapse_parents = true)
Selects the instruments of each treatment-response pair from the given CausalTable
object; that is, variables that are associated with the treatment but do not cause the response.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the instrumental variables of each treatment-response pair.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleCausalTable
object if there is either only one treatment-response pair or all pair share the same set of instruments. Defaults totrue
.
Returns
A new CausalTable
containing only the instruments (if a single response, or all responses share the same set of instruments); otherwise, a Matrix of CausalTable objects containing the instruments of each treatment-response pair, where rows represent responses and columns represent treatments.
CausalTables.instrumentsmatrix
— Methodinstrumentsmatrix(o::CausalTable; collapse_parents = true)
Outputs the treatment-variable instruments from the given CausalTable
object as a matrix (or matrix of matrices, if multiple treatment-response pairs are present).
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the instruments of each treatment-response pair.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleMatrix
object if there is either only one treatment-response pair or all pair share the same set of instruments. Defaults totrue
.
Returns
A matrix containing only the confounders.
CausalTables.mediatornames
— Methodmediatornames(o::CausalTable, x::Symbol, y::Symbol)
Outputs the names of the mediators of the causal relationship between x
and y
from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the mediator names.x::Symbol
,y::Symbol
: The two variables whose mediators should be selected.
Returns
A Vector of Symbols containing the names of the mediators between x and y.
CausalTables.mediatornames
— Methodmediatornames(o::CausalTable)
Outputs the mediator names of each response-treatment pair from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the mediator names of each treatment-response pair.
Returns
A matrix of Vectors containing the mediator names of each treatment-response pair.
CausalTables.mediators
— Methodmediators(o::CausalTable, x::Symbol, y::Symbol)
Selects the mediators for a specific pair of variables (x,y) from the given CausalTable
object; that is, the variables that are caused by x and cause y.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the mediators.x::Symbol
,y::Symbol
: The two variables whose mediators should be selected.
Returns
A new CausalTable
containing only the mediators of both x and y.
CausalTables.mediators
— Methodmediators(o::CausalTable; collapse_parents = true)
Selects the mediators of each treatment-response pair from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the mediator variables of each treatment-response pair.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleCausalTable
object if there is either only one treatment-response pair or all pair share the same set of mediators. Defaults totrue
.
Returns
A new CausalTable
containing only the mediators (if a single response, or all responses share the same set of mediators); otherwise, a Matrix of CausalTable objects containing the mediators of each treatment-response pair, where rows represent responses and columns represent treatments.
CausalTables.mediatorsmatrix
— Methodmediatorsmatrix(o::CausalTable; collapse_parents = true)
Outputs the treatment-variable confounders from the given CausalTable
object as a matrix (or matrix of matrices, if multiple treatment-response pairs are present).
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the mediators of each treatment-response pair.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleMatrix
object if there is either only one treatment-response pair or all pair share the same set of mediators. Defaults totrue
.
Returns
A matrix containing only the confounders.
CausalTables.parents
— Methodparents(o::CausalTable, symbol)
Selects the variables that precede symbol
causally from the CausalTable o
, based on the causes
attribute. Note that if symbol
is not contained within o.causes
, this function will output an empty CausalTable
.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the parent variables ofsymbol
.symbol
: The variable for which to extract the parent variables.
Returns
A new CausalTable
containing only the parents of symbol
CausalTables.reject
— Methodreject(o::CausalTable, symbols)
Removes the columns specified by symbols
from the CausalTable
object o
.
Arguments
o::CausalTable
: TheCausalTable
object from which symbols will be rejected.symbols
: A collection of symbols to be rejected from theCausalTable
.
Returns
A new CausalTable
object with the specified symbols removed from its data.
CausalTables.response
— Methodresponse(o::CausalTable)
Selects the response column(s) from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the response column(s).
Returns
A new CausalTable
containing only the response column(s).
CausalTables.responsematrix
— Methodresponsematrix(o::CausalTable)
Outputs the response column(s) from the given CausalTable
object as a matrix.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the response column(s).
Returns
A matrix containing only the response column(s)
CausalTables.responseparents
— Methodresponseparents(o::CausalTable)
Selects the parents of each response variable from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the parent variables of each response.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleCausalTable
object if there is either only one response or all response have the same parents. Defaults totrue
.
Returns
A new CausalTable
containing only the causes of the responses (if a single response, or all responses share the same set of causes); otherwise, a Vector of CausalTable objects containing the causes of each response.
CausalTables.select
— Methodselect(o::CausalTable, symbols)
Selects specified columns from a CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which columns are to be selected.symbols
: A list of symbols representing the columns to be selected.
Returns
- A new
CausalTable
object with only the selected columns.
CausalTables.treatment
— Methodtreatment(o::CausalTable)
Selects the treatment column(s) from the given CausalTable
object. treatment
Arguments
o::CausalTable
: TheCausalTable
object from which to select the treatment column(s).
Returns
A new CausalTable
containing only the treatment column(s)
CausalTables.treatmentmatrix
— Methodtreatmentmatrix(o::CausalTable)
Outputs the treatment column(s) from the given CausalTable
object as a matrix.
Arguments
o::CausalTable
: TheCausalTable
object from which to select the treatment column(s).
Returns
A matrix containing only the treatment column(s)
CausalTables.treatmentparents
— Methodtreatmentparents(o::CausalTable)
Selects the parents of each treatment variable from the given CausalTable
object.
Arguments
o::CausalTable
: TheCausalTable
object from which to extract the parent variables of each treatment.collape_parents::Bool
: Optional parameter, whether to collapse the output to a singleCausalTable
object if there is either only one treatment or all treatments have the same parents. Defaults totrue
.
Returns
A new CausalTable
containing only the causes of the treatment (if a single treatment, or all treatments share the same set of causes); otherwise, a Vector of CausalTable objects containing the causes of each treatment.