Edits: 2024-08-01.
ocamlfront
was renamed toMlFront
for consistency. 2024-08-02. Typo.MlFront_Codept
notMlFront_Core
.
MlFront
adds a Java-like package system to OCaml. The MlFront
name is a
homage to cfront
which was tooling that translated "C with Classes" (now known
as C++) into C code. Similarly, MlFront
-based tools can translate "OCaml with
packages" into conventional OCaml.
Its home is https://gitlab.com/dkml/build-tools/MlFront.
At its most basic core, MlFront
gives a well-defined, consistent meaning to a
module reference like AcmeWidgets_Std.Activities.Manufacturing
across the
domains of:
- OCaml source code.
- findlib libraries.
- opam packages.
I (jonahbeckford@) will explain that meaning and those domains in detail later in this series of posts. But first ... why care should you care?
MlFront
can:
- say, with some amount of rigour, what an OCaml program can do before the OCaml program does it (a security benefit)
- provide a way to share and re-use source code without naming conflicts (a scale benefit)
- reduce the configuration needed for code re-use (a usability benefit)
And it is opt-in, so it doesn't require modifying the thousands of packages that exist today.
MlFront
is not a proposal. It is a set of libraries and tools (a "framework")
that has been spun out of my private code to fill a perceived hole in the OCaml
ecosystem. I did a similar activity when I spun out the
Windows-friendly DkML distribution
out of my private code to fill a perceived hole (Windows support) in the OCaml
ecosystem.
Here is what MlFront
-based code across multiple projects can look like:
(* file: src/AcmeWidgets_Db/open__.ml *)
module Sqlite3 = MmotlSqlite3_Std.Sqlite3
(* file: src/AcmeWidgets_Db/Embedded.ml *)
let db = Sqlite3.db_open "widgets.db"
let read_all () = Sqlite3.exec db
"SELECT * from tbl0" ~cb:(...)
(* file: src/AcmeWidgets_Cli/Main.ml *)
module Arg = Tr1Stdlib_V414CRuntime.Arg
module Printf = Tr1Stdlib_V414CRuntime.Printf
let sqlite3 = ref false
let speclist =
[("-sqlite3", Arg.Set sqlite3, "Use sqlite3 as an embedded database")]
let () =
Arg.parse speclist anon_fun usage_msg;
if !sqlite3 then
Printf.printf "widgets: %s\n%!"
(AcmeWidgets_Db.Embedded.read_all ())
(* file: src/MmotlSqlite3_Std/Sqlite3.ml *)
include Sqlite3
Let's revisit the features I mentioned now that we have seen some code:
- security:
MlFront
usescodept
for dependency analysis. That means source code for a project can be scanned to pull out the module references accurately (modulo implementation bugs) without compiling the project. And we know the above project has access to a C runtime (Tr1Stdlib_V414CRuntime
) and specifically has command-line arguments (Tr1Stdlib_V414CRuntime.Arg
) as an entry point. - scale: I'll go over this shortly, but
Sqlite3
was renamed toMmotlSqlite3_Std.Sqlite3
in the above code. No more naming conflicts. - usability: From the
codept
analysis the build system can (and should) install the opam packageMmotlSqlite3_Std
and reference the findlib library nameMmotlSqlite3_Std
. ThisMmotlSqlite3_Std
alias is the "opt-in" feature that can be introduced incrementally in the opam package repository with a backwards-compatible findlibMETA
file. The end-user should not need to write configuration files (.opam
/dune-project
) unless their build system has other requirements like versioning.
But all that being said ... and this is why I want you to continue reading ... I
don't know what I don't know. I have read about how Go solves some of the
package problems, but I don't have first-hand experience. You might have the
first-hand experience in a programming language that you think does packages
right. Or you have first-hand experience in things that absolutely do not work.
Simply put, you know things I don't and that is valuable because there is
still an opportunity to make big changes to MlFront
.
The Origin Story (and Problem Statements)
I love using OCaml but it was very frustrating explaining to others how to integrate third-party OCaml code. I posted about this earlier at https://discuss.ocaml.org/t/what-are-the-biggest-reasons-newcomers-give-up-on-ocaml/10958/13. Let's take a common activity: performing a HTTP request. You can use the low-level wrapper around Curl which involves:
- Installing the package
ocurl
using the conventionalopam
package manager. - Declaring the use of the package's
findlib
libraries with a(libraries curl curl.lwt)
statement in the conventionaldune
build tool configuration. - Using the module
Curl_lwt
in your OCaml code.
Notice the capitalization, prefixing, and seperator changing from ocurl
to
curl.lwt
to Curl_lwt
. My guess is the author did not want to be presumptuous
by using the package name curl
so they picked ocurl
. Problem 1: Only one
package maintainer gets to pick curl
in today's global namespace; everyone
else has to be cute with their naming.
If instead you used a popular high-level Curl wrapper you would:
- Install the package
cohttp-curl-lwt
. - Add the library
(libraries cohttp-curl-lwt)
statement. Yay; consistency! - Use the module
Cohttp_curl_lwt
in your OCaml code.
Now cohttp
is a much larger family of projects. It has a hierarchy of
subprojects with each level seperated by dashes (-
): cohttp
, cohttp-curl
and cohttp-curl-lwt
are real packages. They've also standardized the naming
for use inside OCaml code by replacing the dashes with underscores (_
). Those
conventions lead to having to use just two names (cohttp-curl-lwt
and
Cohttp_curl_lwt
) rather than three (ocurl
, curl.lwt
and Curl_lwt
), with
no name clashes if another maintainer decided to make (acme-curl-lwt
and
Acme_curl_lwt
). Big improvement.
But we can do better: use one name for the opam package and the library and
the module. Amongst the trio of module naming, library naming and package
naming, only the module naming has strict requirements. So there is no magic at
all: we have no choice but to use the module name as the unified one name.
Specifically, the one name must start with a capital letter and all remaining
characters are restricted to either be ASCII alphanumeric characters or a few
special characters. In the cohttp-curl-lwt
example, that one name for the
package, library and module name would be Cohttp_curl_lwt
or even
CohttpCurlLwt
.
That is not rocket science, so why isn't every 3rd party package maintainer doing that? Because it is a just a convention that is different from the thousands of packages that exist today. Problem 2: The thousands of existing OCaml packages with their unique naming conventions are a headwind to any change.
I am also heavily invested in developing secure software that works on the (several) platforms I'm experienced with. From my perspective "secure software" means being able to identify with some rigour what a program does before the program does it. I do not mean to imply that writing secure software gets you "security" writ large, but secure software and specifically the identification of entry points and side-effects are important prerequisites. And there are formal methods that can add rigour with tools like Coq, Lean and TLA+. Honestly, formal methods are too much rigour to be practical at scale. An alternative is OCaml's sublanguage for modules. Unlike most languages, you can programmatically inspect the source code of an OCaml program and know which modules the program uses before running the program.
How Does MlFront Work?
MlFront
has a MlFront_Core
MlFront_Codept
library which can produce
build files that can be used by build systems. Today MlFront_Codept
is used by
DkCoder (which itself uses Dune as its primary build system), but there is
nothing in MlFront_Codept
that ties it to DkCoder or Dune. Without loss of
generality I'll be using Dune build files to show what MlFront_Codept
can
produce (with some simplifications for readability):
<!-- file: src/AcmeWidgets_Db/AcmeWidgets_Db.ml -->
let Embedded = AcmeWidgets_Db__Embedded
<!-- file: src/AcmeWidgets_Db/dune -->
(library (name AcmeWidgets_DbO__) (modules open__))
(library (name AcmeWidgets_Db__Embedded)
(ocamlc_flags -alert @need_alternate_stdlib
-open Stdlib414Shadow -open AcmeWidgets_DbO__)
(modules Embedded))
<!-- file: src/AcmeWidgets_Cli/dune -->
(library (name AcmeWidgets_DbO__) (modules open__))
(library (name AcmeWidgets_Cli__Main)
(ocamlc_flags -alert @need_alternate_stdlib
-open Stdlib414Shadow)
(modules Cli)
(libraries AcmeWidgets_Db__Embedded))
<!-- file: src/MmotlSqlite3_Std/dune -->
(library
(name MmotlSqlite3_Std__Sqlite3)
(libraries sqlite3))
I don't assume you understand all of that. But I do want you to see the places where:
MlFront
can give fully qualified names to existing libraries without touching that library.sqlite3
was given the nameMmotlSqlite3_Std__Sqlite3
. No name conflicts.MlFront
is using the-alert
feature of OCaml to stop direct use of the Standard Library. You still have access to the Standard Library, but you have to explicitly import its modules in your new code.MlFront
controls module visibility through the-open
OCaml feature. You can think of that as inserting code at the top of each module to control the behavior of that module.
To make MlFront
-based tooling simple to use, MlFront
provides enough
information that your build system can provide informed recommendations or even
auto-correct (-fix
) your project.
Here is one usability example that could have an auto -fix
:
Error (alert need_alternate_stdlib): module Stdlib414Shadow.Arg
This is part of the standard library distributed with OCaml.
You need to consistently qualify every piece of external code your
project uses, including the standard library.
The recommendation is to place
open Tr1Stdlib_V414CRuntime
at the top of your script -OR- place
module Arg = Tr1Stdlib_V414CRuntime.Arg
in your library's `open__.ml` -OR- directly use
Tr1Stdlib_V414CRuntime.Arg
instead.
and here is another usability example showing the end-user how the module system failed:
Problem
-------
The module [AcmeWidgets_Std.XyzPingHandler] is not present.
Underlying Error
----------------
−Non-resolved external dependency.
The following compilation units {/src/AcmeWidgets_Std/CurlIo.ml}
depend on the unknown module "XyzPingHandler"
−Non-resolved internal dependency.
The following compilation units {/src/AcmeWidgets_Std/ProvAwsSes.ml,
/src/AcmeWidgets_Std/Subscriptions.ml}
depend on the compilation units "/src/AcmeWidgets_Std/CurlIo.ml" that could not be resolved.
Context
-------
The module references are:
(unit AcmeWidgets_Std.CurlIo)
- called from "/src/AcmeWidgets_Std/ProvAwsSes.ml"
-> (unit AcmeWidgets_Std.ProvAwsSes)
- called from "/src/AcmeWidgets_Std/Subscriptions.ml"
-> (unit AcmeWidgets_Std.Subscriptions) - <analysis start>
-> (unit AcmeWidgets_Std.Subscriptions) - <entry>
Suggestion
----------
1. Check for typos.
2. Don't use the module.
3. Create the module as a new 'XyzPingHandler.ml' file.
Vision, expectations and all things meta
I'm not releasing MlFront
out of the goodness of my heart. I am doing this
because I believe MlFront
can become a fundamental security "primitive" for
identifying a program's entry points and side-effects. And I hope we in the
software industry make all security primitives accessible to anyone, anywhere,
at anytime ... because the people they protect matter. So I'm quite happy when
djb releases his huge
volume of security primitives into the public domain, and makes it accessible in
the relatively easy-to-use NaCL library. I'm happy
when my former employer Amazon makes their big-number library
s2n-bignum (a security primitive
for cryptography) accessible to anyone under permissive licenses. And I'm quite
dismayed when GNU releases their big-number library GMP
with an introduction that "set[s] firm restrictions on the use with non-free
programs". wat. We can do better; I'm releasing MlFront
.
To set expectations correctly, please be aware that:
- I'm dogfooding
MlFront
in existing products. That means I will be conservative, see what works and what doesn't, and may work in changes over a few years. But the first year I don't mind breaking theMlFront
API. - I'm not looking for wholesale imports of some other favorite programming language's package system. There needs to be a coherent design theme, and for me that means consistently building on top of OCaml strengths in an manner accessible to beginning programmers.
This work is not funded. If you'd like to help, you can:
- Reduce my work by writing your own PRs. In particular, the adjacent project codept needs attention (making it buildable, supporting OCaml 5, writing docs).
- Evangelize by sharing articles on your favorite sites or writing/creating your own article or video.
- Monetarily fund the work through a development contract with my company or ask
OCSF (I haven't talked with them yet). For
small-dollar amounts the most efficient way to contribute today is to
subscribe to DkCoder (the product
MlFront
was spun out of).
Community Links: