“We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.”
Julia: “runs like C but reads like Python” (Perkel 2019)
Two years ago (Fall 2022) I switched to using Julia for almost all statistical tasks.
Classes
Research
On each slide, I will score Julia on its ease of use in various domains compared to R.
JScore values:
JScore: 2JScore: 2Literate.jl
Don’t forget engine: julia !
JScore: 4JScore: 4Dependency Management
Project.tomlJScore: 4Julia supports test- and doc-specific dependencies
JScore: 4Documentation
JScore: 4Running Experiments
JScore: 1R: dplyr
library(dplyr)
data("iris")
iris |> group_by(Species) |>
summarize(a = mean(Petal.Length) / mean(Sepal.Length),
b = sum(Petal.Length))
Julia: DataFrames.jl
JScore: 2Julia supports multiple “backends”
JScore: 1R: glm function – perfect
Julia: GLM.jl – terrible
MethodError: MethodError(GLM.delbeta!, (GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}([1.0 3.055166006088257; 1.0 0.6637405753135681; … ; 1.0 -0.5296103954315186; 1.0 -1.343801736831665], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}([4.002796686977798 0.6608336050292363; 2.645182564854622 3.092458398501759], 'U', [2, 1], 2, -1.0, 0), [6.9173836729581e-310 6.9173807184827e-310; 6.9173836729581e-310 6.91722498160464e-310; … ; 6.9173836729581e-310 6.9173836729581e-310; 6.9173836725905e-310 0.0], [3.4766779039175e-310 8.4879831683e-314; 4.243991583e-314 6.9172818793886e-310]), Float32[1.2254993, 0.53429693, -0.3176004, -0.94373393, -0.8283784, 0.3207153, 0.7854858, -0.18182965, -0.5882426, 0.28917944]), 0x0000000000007bfd)
MethodError: no method matching delbeta!(::GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}, ::Vector{Float32})
Closest candidates are:
delbeta!(!Matched::GLM.SparsePredChol{T}, ::Vector{T}) where T
@ GLM ~/.julia/packages/GLM/vM20T/src/linpred.jl:220
delbeta!(!Matched::GLM.SparsePredChol{T}, ::Vector{T}, !Matched::Vector{T}) where T
@ GLM ~/.julia/packages/GLM/vM20T/src/linpred.jl:213
delbeta!(::GLM.DensePredChol{T, <:LinearAlgebra.CholeskyPivoted}, !Matched::Vector{T}, !Matched::Vector{T}) where T<:Union{Float32, Float64}
@ GLM ~/.julia/packages/GLM/vM20T/src/linpred.jl:159
...
Stacktrace:
[1] fit!(obj::LinearModel{GLM.LmResp{Vector{Float32}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}})
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:94
[2] fit(::Type{LinearModel}, X::Matrix{Float64}, y::Vector{Float32}, allowrankdeficient_dep::Nothing; wts::Vector{Float32}, dropcollinear::Bool)
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:143
[3] fit(::Type{LinearModel}, X::Matrix{Float64}, y::Vector{Float32}, allowrankdeficient_dep::Nothing)
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:134
[4] fit(::Type{LinearModel}, f::FormulaTerm{Term, Term}, data::DataFrame, args::Nothing; contrasts::Dict{Symbol, Any}, kwargs::@Kwargs{})
@ StatsModels ~/.julia/packages/StatsModels/mPD8T/src/statsmodel.jl:88
[5] fit
@ ~/.julia/packages/StatsModels/mPD8T/src/statsmodel.jl:78 [inlined]
[6] lm
@ ~/.julia/packages/GLM/vM20T/src/lm.jl:157 [inlined]
[7] lm(X::FormulaTerm{Term, Term}, y::DataFrame)
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:157
[8] top-level scope
@ ~/Research/is-julia-ready-for-statistics/presentation.qmd:409
JScore: 1Julia: Survival.jl
R: built-in survival
rangerJScore: 2Both written by Douglas Bates.
JScore: 4Julia: Distributions.jl
R: r[dist] function family
JScore: 4Julia: Turing.jl
R: Nimble, Stan, JAGS, etc.
Turing samples from any arbitrary Julia codeJScore: 4Julia: MLJ.jl
R: sl3
earth, gam)Template for a custom model; similar to sklearn
mutable struct MyRegressor <: MLJBase.Deterministic
myparam::Float64
end
function MLJBase.fit(model::MyRegressor, verbosity, X, y)
fitresult = ... # fit parameters
cache = nothing # optional; to cache for future updates
report = nothing # optional; to expose internals
return fitresult, cache, report
end
function MMI.predict(::DensityRatioClassifier, fitresult, Xy_nu, Xy_de)
...
end
mach = machine(MyRegressor(0.1), X, y) |> fit!JScore: 3Julia: Flux.jl / Lux.jl
Other languages: PyTorch, Tensorflow, etc.
MEstimation.jl solves arbitrary estimating equations via automatic differentationGraphs.jl faster than networkx or igraph, but often need different packages for weighted graphs, etc.Optimization.jl brings them all together