“We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.”
Julia: “runs like C but reads like Python” (Perkel 2019)
Two years ago (Fall 2022) I switched to using Julia for almost all statistical tasks.
Classes
Research
On each slide, I will score Julia on its ease of use in various domains compared to R.
JScore
values:
JScore: 2
JScore: 2
Literate.jl
Don’t forget engine: julia
!
JScore: 4
JScore: 4
Dependency Management
Project.toml
JScore: 4
Julia supports test- and doc-specific dependencies
JScore: 4
Documentation
JScore: 4
Running Experiments
JScore: 1
R: dplyr
library(dplyr)
data("iris")
iris |> group_by(Species) |>
summarize(a = mean(Petal.Length) / mean(Sepal.Length),
b = sum(Petal.Length))
Julia: DataFrames.jl
JScore: 2
Julia supports multiple “backends”
JScore: 1
R: glm
function – perfect
Julia: GLM.jl
– terrible
MethodError: MethodError(GLM.delbeta!, (GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}([1.0 3.055166006088257; 1.0 0.6637405753135681; … ; 1.0 -0.5296103954315186; 1.0 -1.343801736831665], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}([4.002796686977798 0.6608336050292363; 2.645182564854622 3.092458398501759], 'U', [2, 1], 2, -1.0, 0), [6.9173836729581e-310 6.9173807184827e-310; 6.9173836729581e-310 6.91722498160464e-310; … ; 6.9173836729581e-310 6.9173836729581e-310; 6.9173836725905e-310 0.0], [3.4766779039175e-310 8.4879831683e-314; 4.243991583e-314 6.9172818793886e-310]), Float32[1.2254993, 0.53429693, -0.3176004, -0.94373393, -0.8283784, 0.3207153, 0.7854858, -0.18182965, -0.5882426, 0.28917944]), 0x0000000000007bfd)
MethodError: no method matching delbeta!(::GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}, ::Vector{Float32})
Closest candidates are:
delbeta!(!Matched::GLM.SparsePredChol{T}, ::Vector{T}) where T
@ GLM ~/.julia/packages/GLM/vM20T/src/linpred.jl:220
delbeta!(!Matched::GLM.SparsePredChol{T}, ::Vector{T}, !Matched::Vector{T}) where T
@ GLM ~/.julia/packages/GLM/vM20T/src/linpred.jl:213
delbeta!(::GLM.DensePredChol{T, <:LinearAlgebra.CholeskyPivoted}, !Matched::Vector{T}, !Matched::Vector{T}) where T<:Union{Float32, Float64}
@ GLM ~/.julia/packages/GLM/vM20T/src/linpred.jl:159
...
Stacktrace:
[1] fit!(obj::LinearModel{GLM.LmResp{Vector{Float32}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}})
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:94
[2] fit(::Type{LinearModel}, X::Matrix{Float64}, y::Vector{Float32}, allowrankdeficient_dep::Nothing; wts::Vector{Float32}, dropcollinear::Bool)
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:143
[3] fit(::Type{LinearModel}, X::Matrix{Float64}, y::Vector{Float32}, allowrankdeficient_dep::Nothing)
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:134
[4] fit(::Type{LinearModel}, f::FormulaTerm{Term, Term}, data::DataFrame, args::Nothing; contrasts::Dict{Symbol, Any}, kwargs::@Kwargs{})
@ StatsModels ~/.julia/packages/StatsModels/mPD8T/src/statsmodel.jl:88
[5] fit
@ ~/.julia/packages/StatsModels/mPD8T/src/statsmodel.jl:78 [inlined]
[6] lm
@ ~/.julia/packages/GLM/vM20T/src/lm.jl:157 [inlined]
[7] lm(X::FormulaTerm{Term, Term}, y::DataFrame)
@ GLM ~/.julia/packages/GLM/vM20T/src/lm.jl:157
[8] top-level scope
@ ~/Research/is-julia-ready-for-statistics/presentation.qmd:409
JScore: 1
Julia: Survival.jl
R: built-in survival
ranger
JScore: 2
Both written by Douglas Bates.
JScore: 4
Julia: Distributions.jl
R: r[dist]
function family
JScore: 4
Julia: Turing.jl
R: Nimble, Stan, JAGS, etc.
Turing
samples from any arbitrary Julia codeJScore: 4
Julia: MLJ.jl
R: sl3
earth
, gam
)Template for a custom model; similar to sklearn
mutable struct MyRegressor <: MLJBase.Deterministic
myparam::Float64
end
function MLJBase.fit(model::MyRegressor, verbosity, X, y)
fitresult = ... # fit parameters
cache = nothing # optional; to cache for future updates
report = nothing # optional; to expose internals
return fitresult, cache, report
end
function MMI.predict(::DensityRatioClassifier, fitresult, Xy_nu, Xy_de)
...
end
mach = machine(MyRegressor(0.1), X, y) |> fit!
JScore: 3
Julia: Flux.jl
/ Lux.jl
Other languages: PyTorch
, Tensorflow
, etc.
MEstimation.jl
solves arbitrary estimating equations via automatic differentationGraphs.jl
faster than networkx
or igraph
, but often need different packages for weighted graphs, etc.Optimization.jl
brings them all together