Turning Your Data Into a CausalTable

In Julia, most datasets are stored in a Table: a data structure with a Tables.jl-compatible interface. One of the main purposes of CausalTables.jl is to wrap a Table of data in Julia in order to provide it as input to some other causal inference package. Given a Table of some data, we can turn it into a CausalTable by specifying the treatment, response, and control variables.

Constructing the CausalTable

The code below provides an example of how to wrap the Boston Housing dataset as a CausalTable to answer causal questions of the form "How would changing nitrous oxide air pollution (NOX) within Boston-area towns affect median home value (MEDV)?" Any dataset in a Tables.jl-compliant format can be wrapped as a CausalTable. In this example, we turn a DataFrame from DataFrames.jl into a CausalTable object.

using CausalTables
using MLDatasets: BostonHousing
using DataFrames

# get data in a Tables.jl-compliant format
tbl = BostonHousing().dataframe

# Wrapping the dataset in a CausalTable
ctbl = CausalTable(tbl; treatment = :NOX, response = :MEDV)

When only treatment and response are specified, all other variables are assumed to be confounders. However, one can also explicitly specify the causes of both treatment and response by passing them as a NamedTuple of lists to the CausalTable constructor. In the example below, we specify the causes of the treatment NOX only as [:CRIM, :INDUS], and the causes of the response MEDV are specified as [:CRIM, :INDUS, :NOX].

ctbl = CausalTable(tbl; treatment = :NOX, response = :MEDV,
                        causes = (NOX = [:CRIM, :INDUS], MEDV = [:CRIM, :INDUS, :NOX]))
CausalTable
┌─────────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬───────────
│    CRIM │      ZN │   INDUS │  CHAS │     NOX │      RM │     AGE │     DIS  ⋯
│ Float64 │ Float64 │ Float64 │ Int64 │ Float64 │ Float64 │ Float64 │ Float64  ⋯
├─────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼───────────
│ 0.00632 │    18.0 │    2.31 │     0 │   0.538 │   6.575 │    65.2 │    4.09  ⋯
│ 0.02731 │     0.0 │    7.07 │     0 │   0.469 │   6.421 │    78.9 │  4.9671  ⋯
│ 0.02729 │     0.0 │    7.07 │     0 │   0.469 │   7.185 │    61.1 │  4.9671  ⋯
│ 0.03237 │     0.0 │    2.18 │     0 │   0.458 │   6.998 │    45.8 │  6.0622  ⋯
│ 0.06905 │     0.0 │    2.18 │     0 │   0.458 │   7.147 │    54.2 │  6.0622  ⋯
│ 0.02985 │     0.0 │    2.18 │     0 │   0.458 │    6.43 │    58.7 │  6.0622  ⋯
│ 0.08829 │    12.5 │    7.87 │     0 │   0.524 │   6.012 │    66.6 │  5.5605  ⋯
│ 0.14455 │    12.5 │    7.87 │     0 │   0.524 │   6.172 │    96.1 │  5.9505  ⋯
│    ⋮    │    ⋮    │    ⋮    │   ⋮   │    ⋮    │    ⋮    │    ⋮    │    ⋮     ⋱
│ 0.23912 │     0.0 │    9.69 │     0 │   0.585 │   6.019 │    65.3 │  2.4091  ⋯
│ 0.17783 │     0.0 │    9.69 │     0 │   0.585 │   5.569 │    73.5 │  2.3999  ⋯
│ 0.22438 │     0.0 │    9.69 │     0 │   0.585 │   6.027 │    79.7 │  2.4982  ⋯
│ 0.06263 │     0.0 │   11.93 │     0 │   0.573 │   6.593 │    69.1 │  2.4786  ⋯
│ 0.04527 │     0.0 │   11.93 │     0 │   0.573 │    6.12 │    76.7 │  2.2875  ⋯
│ 0.06076 │     0.0 │   11.93 │     0 │   0.573 │   6.976 │    91.0 │  2.1675  ⋯
│ 0.10959 │     0.0 │   11.93 │     0 │   0.573 │   6.794 │    89.3 │  2.3889  ⋯
│ 0.04741 │     0.0 │   11.93 │     0 │   0.573 │    6.03 │    80.8 │   2.505  ⋯
└─────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴───────────
                                                  6 columns and 490 rows omitted
Summaries: NamedTuple()
Arrays: NamedTuple()

Note that a full representation of the causes of each variable is not required, though they can be specified (this is often referred to a "directed acyclic graph"). Only the causes of the treatment and response are necessary as input; CausalTables.jl can compute other types of variables one might be interested in like confounders or mediators automatically.

Warning

When provided, the partial edgelist represented by causes assumes that if variable A is not listed as a cause of B, then no "causal path" exists between A and B – the two variables are uncorrelated. This differs slightly from the common definition of a directed acyclic graph edge in causal inference, where A can be considered a cause of B even if it only acts through another variable C. In this case, specify both A and C as causes of B in causes when constructing the CausalTable.

After wrapping a dataset in a CausalTable object, the Tables.jl is available to call on the CausalTable as well. Below, we demonstrate a few of these functions, as well as additional utility functions for causal inference tasks made available by CausalTables.jl.

using Tables

# Examples of using the Tables.jl interface
Tables.getcolumn(ctbl, :NOX) # extract specific column
Tables.subset(ctbl, 1:5)     # exact specific rows
Tables.columnnames(ctbl)     # obtain all column names
(:CRIM, :ZN, :INDUS, :CHAS, :NOX, :RM, :AGE, :DIS, :RAD, :TAX, :PTRATIO, :B, :LSTAT, :MEDV)

In addition, the CausalTable object has several utility functions that can be used to extract different types of variables relevant to causal inference from the CausalTable object.

# Additional utility functions for CausalTables
treatment(ctbl)              # get CausalTable of treatment variables
response(ctbl)               # get CausalTable of response variables
treatmentparents(ctbl)       # get CausalTable of treatment and response
responseparents(ctbl)        # get CausalTable of treatment and confounders

parents(ctbl, :NOX)          # get CausalTable of parents of a particular variable

confounders(ctbl)            # get CausalTable of confounders
mediators(ctbl)              # get CausalTable of mediators
instruments(ctbl)            # get CausalTable of instruments

data(ctbl)                   # get underlying wrapped dataset of a CausalTable

Although the CausalTable object is immutable, one can replace the values of its attributes with new ones using the replace function. The code below demonstrates how to replace the treatment and response variables of the CausalTable object ctbl with :CRIM and nothing, respectively. Setting causes = nothing is a quick shortcut to specify that all unlabeled variables are confounders of the treatment-response relationship.

# Replace one or more attributes of the CausalTable.
# Setting `causes = nothing` is a quick shortcut to specify
# that all unlabeled variables are confounders of the treatment-response relationship
CausalTables.replace(ctbl; response = :CRIM, causes = nothing)

Tables with Network-Dependent Units

The previous example assumes that each unit (row in the Table, in this case tbl), is "causally independent" of every other unit – that is, the treatment of one unit does not affect the response of any other unit. This is a component of the "stable unit treatment value assumption" (SUTVA) often used in causal inference. In some cases, however, we might work with data in which units may not be causally independent, but rather, in which one unit's variables depend on some summary function of its neighbors

In this case, one must instead perform causal inference on the summary functions of each unit's neighbors (Aronow and Samii, 2017). To do this, each CausalTable has two relevant arguments that can be used to correct SUTVA violations. The arrays argument is a NamedTuple that can store adjacency matrices and other miscellaneous parameters that denote the causal relationships between variables. The summaries argument is a tuple of NetworkSummary objects that can be used to summarize the network relationships between units by referencing variables in either the underlying data or the arrays argument of CausalTable (or both).

The code below provides an example of how such a CausalTable might be constructed to consider a summary function treatment in the case of causally-dependent units, using the Karate Club dataset. In this example, treatment is defined as the number of friends a club member has, denoted by the summary function parameter summaries = (friends = Friends(:F),). Hence, this answers the causal question "how would changing a subject's number of friends (friends) affect which club they are likely to join (labels_clubs)?"

We store the network relationships between units as an adjacency matrix F by assigning it to the arrays parameters. This allows the Friends(:F) summary function to access it when calling summarize(ctbl). More detail on the types of NetworkSummary that can be used in a dependent-data CausalTable can be found in Network Summaries

using CausalTables
using MLDatasets
using Graphs

# Get a Table of Karate Club data from MLDatasets
data = KarateClub()
tbl = data.graphs[1].node_data

# Convert the karate club data into a Graphs.jl graph object
g = SimpleGraphFromIterator([Edge(x...) for x in zip(data.graphs[1].edge_index...)])

# Store the "friends" as an the adjacency matrix in a NamedTuple
# Note that the input to arrays must be a NamedTuple, even if there is only one summary variable,
# so the trailing comma is necessary.
m = (F = Graphs.adjacency_matrix(g),)

# Construct a CausalTable with the adjacency matrix stored in `arrays` and a summary variable recording the number of friends
ctbl = CausalTable(tbl; treatment = :friends, response = :labels_clubs, arrays = m, summaries = (friends = Friends(:F),))

One can then call the function summarize(ctbl) to compute the values of the summary function on the causal table.

Based on these summaries, it is also possible to extract two matrices from the CausalTable object: the adjacency_matrix and the dependency_matrix. The adjacency_matrix denotes which units are causally dependent upon one another: an entry of 1 in cell $(i,j)$ indicates that some variable in unit i exhibits a causal relationship to some variable in unit j. The dependency_matrix denotes which units are statistically dependent upon one another: an entry of 1 in cell $(i,j)$ indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common neighbor in the adjacency matrix.

CausalTables.adjacency_matrix(ctbl) # get adjacency matrix
CausalTables.dependency_matrix(ctbl) # get dependency matrix

API

Base.replaceMethod
replace(o::CausalTable; kwargs...)

Replace the fields of a CausalTable object with the provided keyword arguments.

Arguments

  • o::CausalTable: The CausalTable object to be replaced.
  • kwargs...: Keyword arguments specifying the new values for the fields.

Returns

A new CausalTable object with the specified fields replaced.

source
CausalTables.adjacency_matrixMethod
adjacency_matrix(o::CausalTable)

Generate the adjacency matrix induced by the summaries and arrays attributes of a CausalTable object. This matrix denotes which units are causally dependent upon one another: an entry of 1 in cell (i,j) indicates that some variable in unit i exhibits a causal relationship to some variable in unit j.

Arguments

  • o::CausalTable: The CausalTable object for which the adjacency matrix is to be generated.

Returns

A boolean matrix representing the adjacency relationships in the CausalTable.

source
CausalTables.confoundernamesMethod
confoundernames(o::CausalTable, x::Symbol, y::Symbol)

Outputs the names of the confounders of the causal relationship between x and y from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the confounder names.
  • x::Symbol, y::Symbol: The two variables whose confounders should be selected.

Returns

A Vector of Symbols containing the names of the confounders between x and y.

source
CausalTables.confoundernamesMethod
confoundernames(o::CausalTable)

Outputs the confounder names of each response-treatment pair from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the confounder names of each treatment-response pair.

Returns

A matrix of Vectors containing the confounder names of each treatment-response pair.

source
CausalTables.confoundersMethod
confounders(o::CausalTable, x::Symbol, y::Symbol)

Selects the common causes for a specific pair of variables (x,y) from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the confounders.
  • x::Symbol, y::Symbol: The two variables whose confounders should be selected.

Returns

A new CausalTable containing only the confounders of both x and y.

source
CausalTables.confoundersMethod
confounders(o::CausalTable; collapse_parents = true)

Selects the confounders of each response-treatment pair from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the confounder variables of each treatment-response pair.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single CausalTable object if there is either only one treatment-response pair or all pair share the same set of confounders. Defaults to true.

Returns

A new CausalTable containing only the confounders (if a single response, or all responses share the same set of causes); otherwise, a Matrix of CausalTable objects containing the confounders of each treatment-response pair, where rows represent responses and columns represent treatments.

source
CausalTables.confoundersmatrixMethod
confoundersmatrix(o::CausalTable; collapse_parents = true)

Outputs the treatment-variable confounders from the given CausalTable object as a matrix (or matrix of matrices, if multiple treatment-response pairs are present).

Arguments

  • o::CausalTable: The CausalTable object from which to extract the confounders of each treatment-response pair.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single Matrix object if there is either only one treatment-response pair or all pair share the same set of confounders. Defaults to true.

Returns

A matrix containing only the confounders.

source
CausalTables.dataMethod
data(o::CausalTable)

Retrieve the data stored in a CausalTable object.

Arguments

  • o::CausalTable: The CausalTable from which to retrieve the data.

Returns

The data stored in the CausalTable object.

source
CausalTables.dependency_matrixMethod
dependency_matrix(o::CausalTable)

Generate the dependency matrix induced by the summaries and arrays attributes of a CausalTable object. This matrix stores which units are statistically dependent upon one another: an entry of 1 in cell (i,j) indicates that the data of unit i is correlated with the data in unit j. Two units are correlated if they either are causally dependent (neighbors in the adjacency matrix) or share a common cause (share a neighbor in the adjacency matrix).

Arguments

  • o::CausalTable: The CausalTable object for which the dependency matrix is to be generated.

Returns

A boolean matrix representing the relationships in the CausalTable.

source
CausalTables.instrumentnamesMethod
instrumentnames(o::CausalTable, x::Symbol, y::Symbol)

Outputs the names of the instruments of the causal relationship between x and y from the given CausalTable object; that is, variables that are associated with x but do not cause y.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the mediator names.
  • x::Symbol, y::Symbol: The two variables whose mediators should be selected.

Returns

A Vector of Symbols containing the names of the mediators between x and y.

source
CausalTables.instrumentnamesMethod
instrumentnames(o::CausalTable)

Outputs the instrument names of each treatment-response pair from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the instrument names of each treatment-response pair.

Returns

A matrix of Vectors containing the instrument names of each treatment-response pair.

source
CausalTables.instrumentsMethod
instruments(o::CausalTable, x::Symbol, y::Symbol)

Selects the instruments for a specific pair of variables (x,y) from the given CausalTable object; that is, variables that are associated with x but do not cause y.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the instruments.
  • x::Symbol, y::Symbol: The two variables whose instruments should be selected.

Returns

A new CausalTable containing only the instruments of both x and y.

source
CausalTables.instrumentsMethod
instruments(o::CausalTable; collapse_parents = true)

Selects the instruments of each treatment-response pair from the given CausalTable object; that is, variables that are associated with the treatment but do not cause the response.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the instrumental variables of each treatment-response pair.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single CausalTable object if there is either only one treatment-response pair or all pair share the same set of instruments. Defaults to true.

Returns

A new CausalTable containing only the instruments (if a single response, or all responses share the same set of instruments); otherwise, a Matrix of CausalTable objects containing the instruments of each treatment-response pair, where rows represent responses and columns represent treatments.

source
CausalTables.instrumentsmatrixMethod
instrumentsmatrix(o::CausalTable; collapse_parents = true)

Outputs the treatment-variable instruments from the given CausalTable object as a matrix (or matrix of matrices, if multiple treatment-response pairs are present).

Arguments

  • o::CausalTable: The CausalTable object from which to extract the instruments of each treatment-response pair.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single Matrix object if there is either only one treatment-response pair or all pair share the same set of instruments. Defaults to true.

Returns

A matrix containing only the confounders.

source
CausalTables.mediatornamesMethod
mediatornames(o::CausalTable, x::Symbol, y::Symbol)

Outputs the names of the mediators of the causal relationship between x and y from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the mediator names.
  • x::Symbol, y::Symbol: The two variables whose mediators should be selected.

Returns

A Vector of Symbols containing the names of the mediators between x and y.

source
CausalTables.mediatornamesMethod
mediatornames(o::CausalTable)

Outputs the mediator names of each response-treatment pair from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the mediator names of each treatment-response pair.

Returns

A matrix of Vectors containing the mediator names of each treatment-response pair.

source
CausalTables.mediatorsMethod
mediators(o::CausalTable, x::Symbol, y::Symbol)

Selects the mediators for a specific pair of variables (x,y) from the given CausalTable object; that is, the variables that are caused by x and cause y.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the mediators.
  • x::Symbol, y::Symbol: The two variables whose mediators should be selected.

Returns

A new CausalTable containing only the mediators of both x and y.

source
CausalTables.mediatorsMethod
mediators(o::CausalTable; collapse_parents = true)

Selects the mediators of each treatment-response pair from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the mediator variables of each treatment-response pair.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single CausalTable object if there is either only one treatment-response pair or all pair share the same set of mediators. Defaults to true.

Returns

A new CausalTable containing only the mediators (if a single response, or all responses share the same set of mediators); otherwise, a Matrix of CausalTable objects containing the mediators of each treatment-response pair, where rows represent responses and columns represent treatments.

source
CausalTables.mediatorsmatrixMethod
mediatorsmatrix(o::CausalTable; collapse_parents = true)

Outputs the treatment-variable confounders from the given CausalTable object as a matrix (or matrix of matrices, if multiple treatment-response pairs are present).

Arguments

  • o::CausalTable: The CausalTable object from which to extract the mediators of each treatment-response pair.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single Matrix object if there is either only one treatment-response pair or all pair share the same set of mediators. Defaults to true.

Returns

A matrix containing only the confounders.

source
CausalTables.parentsMethod
parents(o::CausalTable, symbol)

Selects the variables that precede symbol causally from the CausalTable o, based on the causes attribute. Note that if symbol is not contained within o.causes, this function will output an empty CausalTable.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the parent variables of symbol.
  • symbol: The variable for which to extract the parent variables.

Returns

A new CausalTable containing only the parents of symbol

source
CausalTables.rejectMethod
reject(o::CausalTable, symbols)

Removes the columns specified by symbols from the CausalTable object o.

Arguments

  • o::CausalTable: The CausalTable object from which symbols will be rejected.
  • symbols: A collection of symbols to be rejected from the CausalTable.

Returns

A new CausalTable object with the specified symbols removed from its data.

source
CausalTables.responseMethod
response(o::CausalTable)

Selects the response column(s) from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to select the response column(s).

Returns

A new CausalTable containing only the response column(s).

source
CausalTables.responsematrixMethod
responsematrix(o::CausalTable)

Outputs the response column(s) from the given CausalTable object as a matrix.

Arguments

  • o::CausalTable: The CausalTable object from which to select the response column(s).

Returns

A matrix containing only the response column(s)

source
CausalTables.responseparentsMethod
responseparents(o::CausalTable)

Selects the parents of each response variable from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the parent variables of each response.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single CausalTable object if there is either only one response or all response have the same parents. Defaults to true.

Returns

A new CausalTable containing only the causes of the responses (if a single response, or all responses share the same set of causes); otherwise, a Vector of CausalTable objects containing the causes of each response.

source
CausalTables.selectMethod
select(o::CausalTable, symbols)

Selects specified columns from a CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which columns are to be selected.
  • symbols: A list of symbols representing the columns to be selected.

Returns

  • A new CausalTable object with only the selected columns.
source
CausalTables.treatmentMethod
treatment(o::CausalTable)

Selects the treatment column(s) from the given CausalTable object. treatment

Arguments

  • o::CausalTable: The CausalTable object from which to select the treatment column(s).

Returns

A new CausalTable containing only the treatment column(s)

source
CausalTables.treatmentmatrixMethod
treatmentmatrix(o::CausalTable)

Outputs the treatment column(s) from the given CausalTable object as a matrix.

Arguments

  • o::CausalTable: The CausalTable object from which to select the treatment column(s).

Returns

A matrix containing only the treatment column(s)

source
CausalTables.treatmentparentsMethod
treatmentparents(o::CausalTable)

Selects the parents of each treatment variable from the given CausalTable object.

Arguments

  • o::CausalTable: The CausalTable object from which to extract the parent variables of each treatment.
  • collape_parents::Bool: Optional parameter, whether to collapse the output to a single CausalTable object if there is either only one treatment or all treatments have the same parents. Defaults to true.

Returns

A new CausalTable containing only the causes of the treatment (if a single treatment, or all treatments share the same set of causes); otherwise, a Vector of CausalTable objects containing the causes of each treatment.

source