A Riesz representer perspective on targeted learning

Salvador Balkus

sbalkus@g.harvard.edu

Harvard Biostatistics

Christian Testa

ctesta@hsph.harvard.edu

Harvard Biostatistics

Nima Hejazi

nhejazi@hsph.harvard.edu

Harvard Biostatistics

April 15, 2026

Causal Inference
Modus Operandi

Choose an estimand \(\psi\) and 
 gather data \(O_1, \ldots, O_n \sim \mathsf{P}\)

Construct an estimator of \(\psi\)

Targeted Minimum Loss-Based Estimation

Even if \(\mathsf{P}\) totally unknown (nonparametric), can construct “good” plug-in estimator of \(\psi\) by…

Deriving an “efficient influence function” \(\phi(\mathsf{P})(O_i; \psi)\)

TMLE: Choosing and optimizing loss \(L(O, \varepsilon)\) satisfying, for some \(v\),

\[v^\top\nabla_\varepsilon L(O; \varepsilon)\Big|_{\varepsilon = 0} = \frac{1}{n}\sum_{i=1}^n \phi(\mathsf{P}_n)(O_i; \psi)\]

→ choose \(L\) to respect problem constraints (i.e. bounded outcomes)

The derivation of the efficient influence function is often regarded as somewhat of a “dark art.”

Hines et al. (2022)

Q: Can we derive a TMLE for a general class of estimands?

Riesz Representation

Riesz Representation Theorem (statistics version).

Suppose \(\eta \in \mathcal{L}_2(\mathsf{P})\), and \(\psi \coloneqq \Psi(\eta) = \mathsf{E}[h(O; \eta)]\) is a bounded linear functional. Then, there exists a Riesz representer \(\alpha \in \mathcal{L}_2(\mathsf{P})\) such that

\[\Psi(\eta) = \mathsf{E}[\alpha(O)\eta(O)]\]

Think of \(\alpha\) like a “balancing weight”!

Theorem (Riesz EIF). The efficient influence function of \(\mathsf{E}[h(O; \eta)]\) is

\[\phi(\mathsf{P})(O) = \underbrace{\textcolor{teal}{h(O; \eta)} - \Psi(\eta)}_{\text{expected value EIF}} + \underbrace{\int \textcolor{red}{\alpha(O)} \textcolor{blue}{\phi_{\eta}(\mathsf{P})(O)} d\mathsf{P}}_{\text{"reweighted nuisance bias"}}\]

where \(\textcolor{blue}{\phi_\eta(\mathsf{P})(O)}\) denotes the efficient influence function of the nuisance parameter \(\eta\).

Generalizes previous work, like Hirshberg and Wager (2021), Chernozhukov et al. (2022), or Williams et al. (2025)

Example 1: Counterfactual mean \(\Psi(\eta) = \mathsf{E}[\mathsf{E}(Y \mid A = a, L)]\) where \(\eta(A, L) = \mathsf{E}(Y \mid A, L)\). Its EIF is

\[\underbrace{\textcolor{teal}{\mathsf{E}(Y \mid A = a, L)}}_{\substack{\text{evaluator}\\ h(A, L; \eta)}} - \Psi(\eta) + \underbrace{\textcolor{red}{\frac{\mathbb{1}(A = a)}{d\mathsf{P}(A = a \mid L)}}}_{\substack{\text{Riesz representer}\\ \alpha(A, L)}}\underbrace{\textcolor{blue}{(Y - \mathsf{E}(Y \mid A, L))}}_{\substack{\text{derivative of}\\\text{squared loss}}}\]

Integral cancels out because \(\phi_\eta(\mathsf{P})(O) = \frac{\delta_{A, L}}{d\mathsf{P}(A, L)}(Y - \mathsf{E}(Y \mid A, L))\)

Example 2: Counterfactual mean \(\Psi(\eta) = \mathsf{E}[\mathsf{E}(Y \mid A = A + \delta, L)]\) of a policy setting \(A = A + \delta\) where \(\eta(A, L) = \mathsf{E}(Y \mid A, L)\). Its EIF is

\[\underbrace{\textcolor{teal}{\mathsf{E}(Y \mid A = A + \delta, L)}}_{\substack{\text{evaluator}\\ h(A, L; \eta)}} - \Psi(\eta) + \underbrace{\textcolor{red}{\frac{d\mathsf{P}(A - \delta \mid L)}{d\mathsf{P}(A\mid L)}}}_{\substack{\text{Riesz representer}\\ \alpha(A, L)}}\underbrace{\textcolor{blue}{(Y - \mathsf{E}(Y \mid A, L))}}_{\substack{\text{derivative of}\\\text{squared loss}}}\]

Integral cancels out because \(\phi_\eta(\mathsf{P})(O) = \frac{\delta_{A, L}}{d\mathsf{P}(A, L)}(Y - \mathsf{E}(Y \mid A, L))\)

Example 3: Mean \(\tau\)-th quantile \(\Psi(\eta) = \mathsf{E}[Q^{\tau}(Y \mid A = a, L)]\) under treatment where \(\eta(A, L) = Q^{\tau}(Y \mid A, L)\). Its EIF is

\[\underbrace{\textcolor{teal}{Q^\tau(Y \mid A = a, L)}}_{\substack{\text{evaluator}\\ h(A, L; \eta)}} - \Psi(\eta) + \underbrace{\textcolor{red}{\frac{\mathbb{1}(A = a)}{d\mathsf{P}(A = a \mid L)}}}_{\substack{\text{Riesz representer}\\\alpha(A,L)}}\underbrace{\textcolor{blue}{\left(\frac{\tau - \mathbb{1}(Y > Q^\tau(A, L))}{d\mathsf{P}(Q^\tau(A, L) \mid A, L)}\right)}}_{\substack{\text{reweighted derivative of}\\\text{"pinball loss"}}}\]

Integral cancels out because \(\phi_\eta(\mathsf{P})(O) = \frac{\delta_{A, L}}{d\mathsf{P}(A, L)}\left(\frac{\tau - \mathbb{1}(Y > Q^\tau(A, L))}{d\mathsf{P}(Q^\tau(A, L) \mid A, L)}\right)\)

Many more ways to use this

Consider a general time-ordered data structure

\[O = (L_1, A_1, \ldots, L_T, A_T, Y)\]

Denote the histories at time \(t\) as \(\bar{A}_t\) and \(\bar{L}_t\). For example:

Longitudinal data
Mediation

Theorem (Sequential Riesz EIF)

Consider the estimand \(\Psi(\eta_1) = \mathsf{E}_{\mathsf{P}}[h_1(A_{1}, L_{1}; \eta_1)]\), where \(\eta_t\) is a bounded linear functional defined sequentially such that, for \(t = 1, \ldots, T\), we have

\[\eta_{t}(\bar{A}_{t}, \bar{L}_{t}) = \mathsf{E}[h_{t+1}(\bar{A}_{t+1}, \bar{L}_{t+1}; \eta_{t+1}) \mid \bar{A}_t, \bar{L}_t]\]

with \(h_{T+1}(\bar{A}_{T+1}, \bar{L}_{T+1}; \eta_{T+1}) \coloneqq Y\). Let \(\alpha_t\) denote the Riesz representer for \(\eta_{t}\) in the functional \(\mathsf{E}[h_{t}(\bar{A}_{t}, \bar{L}_{t}; \eta_{t}) \mid \bar{A}_{t-1}, \bar{L}_{t-1}]\). Then, the EIF of the estimand \(\Psi(\eta_1)\) is

\[\underbrace{\textcolor{teal}{h_1(A_{1}, L_{1}; \eta_1)} - \Psi(\eta_1)}_{\text{Expected value EIF}} + \sum_{t=1}^{T}\underbrace{\textcolor{red}{\prod_{k=1}^t \alpha_t(\bar{A}_{t}, \bar{L}_{t})}}_{\substack{\text{Riesz}\\\text{representer}\\\text{reweighting}}}\underbrace{[\textcolor{teal}{h_{t+1}(\bar{A}_{t+1}, \bar{L}_{t+1}; \eta_{t+1})} - \textcolor{blue}{\eta_t(\bar{A}_{t}, \bar{L}_{t})}]}_{\text{Residuals of sequential regressions}}\]

Sequential TMLE

Fit sequential regressions \(\textcolor{blue}{\eta_1, \ldots, \eta_T}\) and Riesz representers \(\textcolor{red}{\alpha_1, \ldots, \alpha_T}\).

For \(t = 1, \ldots T\), compute the weights \(\textcolor{red}{\omega_t(\bar{A}_{t}, \bar{L}_{t}) = \prod_{k=1}^t \alpha_k(\bar{A}_{k}, \bar{L}_{k})}\)

For \(t = T-1, \ldots, 1\), fit 1-D parametric model \(\eta_{t, \hat{\varepsilon}_t}\) that regresses
\(\text{link}[\underbrace{\textcolor{teal}{h_{t+1}(\bar{A}_{t+1},\bar{L}_{t+1}, \eta_{t+1, \hat{\varepsilon}_{t+1}})}}_{\text{outcome } \eta_t\text{ was fitted on}}] = \text{link}[\underbrace{\textcolor{blue}{\eta_{t}(\bar{A}_{t}, \bar{L}_{t})}}_{\substack{\text{offset: original}\\ \text{regression} }}] + \underbrace{\varepsilon_{t}\textcolor{red}{\omega_t(\bar{A}_{t}, \bar{L}_{t})}}_{\substack{\text{clever}\\\text{covariate}}}\) and set \(\eta_{t} = \eta_{t, \hat{\varepsilon}_t}\)

The final TMLE is the updated plug-in estimator \[\psi_n = \frac{1}{n}\sum_{i=1}^n\text{link}[\textcolor{teal}{h_1(\bar{A}_{i1}, \bar{L}_{i1}; \eta_{1, \hat{\varepsilon}_1})}]\] → Consistent, asymptotically normal, and semi-parametric efficient

The `RieszCML` package: Simulations

Conclusions

Unifies semi-parametric efficient estimation across…
- Data: Longitudinal, mediation, two-phase sampling, etc.
- Interventions: Binary, stochastic, derivatives, quantile effects, etc.

Support theory and software re-use (Lego bricks)

Agnostic about how \(\alpha\) is learned; see, e.g. Riesz regression (Chernozhukov et al., 2022)

Thank you! Questions?

Preprint

More about me

References

Chernozhukov, V., Newey, W. K. and Singh, R. (2022) Automatic debiased machine learning of causal and structural effects. Econometrica, 90, 967--1027. The Econometric Society. DOI: 10.3982/ecta18515.

Hines, O., Dukes, O., Diaz-Ordaz, K., et al. (2022) Demystifying statistical learning based on efficient influence functions. The American Statistician, 76, 292–304. Informa UK Limited. DOI: 10.1080/00031305.2021.2021984.

Hirshberg, D. A. and Wager, S. (2021) Augmented minimax linear estimation. The Annals of Statistics, 49. Institute of Mathematical Statistics. DOI: 10.1214/21-aos2080.

Williams, N. T., Hines, O. J. and Rudolph, K. E. (2025) Riesz representers for the rest of us. arXiv. DOI: 10.48550/ARXIV.2507.19413.

Appendix A: Data Analysis

Appendix B: Functional analysis

Theorem (Riesz Representation, general).

Suppose \(\eta \in \mathcal{H}\), a Hilbert space, and that \(\psi(\eta) : \mathcal{H} \mapsto \mathbb{R}\) is a bounded linear functional.

Then, there exists \(\alpha \in \mathcal{H}\) such that

\[\psi(\eta) = \langle \alpha, \eta \rangle\]

A Riesz representer perspective on targeted learning

Causal Inference Modus Operandi

Targeted Minimum Loss-Based Estimation

Q: Can we derive a TMLE for a general class of estimands?

Riesz Representation

Many more ways to use this

The RieszCML package: Simulations

Conclusions

Thank you! Questions?

References

Appendix A: Data Analysis

Appendix B: Functional analysis

Causal Inference
Modus Operandi

The `RieszCML` package: Simulations