MlFront - Java-like packages for OCaml. Part 1 - Overview

Edits: 2024-08-01. ocamlfront was renamed to MlFront for consistency. 2024-08-02. Typo. MlFront_Codept not MlFront_Core.

MlFront adds a Java-like package system to OCaml. The MlFront name is a homage to cfront which was tooling that translated “C with Classes” (now known as C++) into C code. Similarly, MlFront-based tools can translate “OCaml with packages” into conventional OCaml.

Its home is https://gitlab.com/dkml/build-tools/MlFront.

At its most basic core, MlFront gives a well-defined, consistent meaning to a module reference like AcmeWidgets_Std.Activities.Manufacturing across the domains of:

  1. OCaml source code.
  2. findlib libraries.
  3. opam packages.

I (jonahbeckford@) will explain that meaning and those domains in detail later in this series of posts. But first … why care should you care?

MlFront can:

  • say, with some amount of rigour, what an OCaml program can do before the OCaml program does it (a security benefit)
  • provide a way to share and re-use source code without naming conflicts (a scale benefit)
  • reduce the configuration needed for code re-use (a usability benefit)

And it is opt-in, so it doesn’t require modifying the thousands of packages that exist today.

MlFront is not a proposal. It is a set of libraries and tools (a “framework”) that has been spun out of my private code to fill a perceived hole in the OCaml ecosystem. I did a similar activity when I spun out the Windows-friendly DkML distribution out of my private code to fill a perceived hole (Windows support) in the OCaml ecosystem.

Here is what MlFront-based code across multiple projects can look like:

(* file: src/AcmeWidgets_Db/open__.ml *)

module Sqlite3 = MmotlSqlite3_Std.Sqlite3

(* file: src/AcmeWidgets_Db/Embedded.ml *)

let db = Sqlite3.db_open "widgets.db"
let read_all () = Sqlite3.exec db
        "SELECT * from tbl0" ~cb:(...)

(* file: src/AcmeWidgets_Cli/Main.ml *)

module Arg = Tr1Stdlib_V414CRuntime.Arg
module Printf = Tr1Stdlib_V414CRuntime.Printf

let sqlite3 = ref false
let speclist =
    [("-sqlite3", Arg.Set sqlite3, "Use sqlite3 as an embedded database")]

let () =
    Arg.parse speclist anon_fun usage_msg;
    if !sqlite3 then
        Printf.printf "widgets: %s\n%!"
            (AcmeWidgets_Db.Embedded.read_all ())

(* file: src/MmotlSqlite3_Std/Sqlite3.ml *)

include Sqlite3

Let’s revisit the features I mentioned now that we have seen some code:

  • security: MlFront uses codept for dependency analysis. That means source code for a project can be scanned to pull out the module references accurately (modulo implementation bugs) without compiling the project. And we know the above project has access to a C runtime (Tr1Stdlib_V414CRuntime) and specifically has command-line arguments (Tr1Stdlib_V414CRuntime.Arg) as an entry point.
  • scale: I’ll go over this shortly, but Sqlite3 was renamed to MmotlSqlite3_Std.Sqlite3 in the above code. No more naming conflicts.
  • usability: From the codept analysis the build system can (and should) install the opam package MmotlSqlite3_Std and reference the findlib library name MmotlSqlite3_Std. This MmotlSqlite3_Std alias is the “opt-in” feature that can be introduced incrementally in the opam package repository with a backwards-compatible findlib META file. The end-user should not need to write configuration files (.opam / dune-project) unless their build system has other requirements like versioning.

But all that being said … and this is why I want you to continue reading … I don’t know what I don’t know. I have read about how Go solves some of the package problems, but I don’t have first-hand experience. You might have the first-hand experience in a programming language that you think does packages right. Or you have first-hand experience in things that absolutely do not work. Simply put, you know things I don’t and that is valuable because there is still an opportunity to make big changes to MlFront.

The Origin Story (and Problem Statements)

I love using OCaml but it was very frustrating explaining to others how to integrate third-party OCaml code. I posted about this earlier at https://discuss.ocaml.org/t/what-are-the-biggest-reasons-newcomers-give-up-on-ocaml/10958/13. Let’s take a common activity: performing a HTTP request. You can use the low-level wrapper around Curl which involves:

  1. Installing the package ocurl using the conventional opam package manager.
  2. Declaring the use of the package’s findlib libraries with a (libraries curl curl.lwt) statement in the conventional dune build tool configuration.
  3. Using the module Curl_lwt in your OCaml code.

Notice the capitalization, prefixing, and seperator changing from ocurl to curl.lwt to Curl_lwt. My guess is the author did not want to be presumptuous by using the package name curl so they picked ocurl. Problem 1: Only one package maintainer gets to pick curl in today’s global namespace; everyone else has to be cute with their naming.

If instead you used a popular high-level Curl wrapper you would:

  1. Install the package cohttp-curl-lwt.
  2. Add the library (libraries cohttp-curl-lwt) statement. Yay; consistency!
  3. Use the module Cohttp_curl_lwt in your OCaml code.

Now cohttp is a much larger family of projects. It has a hierarchy of subprojects with each level seperated by dashes (-): cohttp, cohttp-curl and cohttp-curl-lwt are real packages. They’ve also standardized the naming for use inside OCaml code by replacing the dashes with underscores (_). Those conventions lead to having to use just two names (cohttp-curl-lwt and Cohttp_curl_lwt) rather than three (ocurl, curl.lwt and Curl_lwt), with no name clashes if another maintainer decided to make (acme-curl-lwt and Acme_curl_lwt). Big improvement.

But we can do better: use one name for the opam package and the library and the module. Amongst the trio of module naming, library naming and package naming, only the module naming has strict requirements. So there is no magic at all: we have no choice but to use the module name as the unified one name. Specifically, the one name must start with a capital letter and all remaining characters are restricted to either be ASCII alphanumeric characters or a few special characters. In the cohttp-curl-lwt example, that one name for the package, library and module name would be Cohttp_curl_lwt or even CohttpCurlLwt.

That is not rocket science, so why isn’t every 3rd party package maintainer doing that? Because it is a just a convention that is different from the thousands of packages that exist today. Problem 2: The thousands of existing OCaml packages with their unique naming conventions are a headwind to any change.

I am also heavily invested in developing secure software that works on the (several) platforms I’m experienced with. From my perspective “secure software” means being able to identify with some rigour what a program does before the program does it. I do not mean to imply that writing secure software gets you “security” writ large, but secure software and specifically the identification of entry points and side-effects are important prerequisites. And there are formal methods that can add rigour with tools like Coq, Lean and TLA+. Honestly, formal methods are too much rigour to be practical at scale. An alternative is OCaml’s sublanguage for modules. Unlike most languages, you can programmatically inspect the source code of an OCaml program and know which modules the program uses before running the program.

How Does MlFront Work?

MlFront has a MlFront_Core MlFront_Codept library which can produce build files that can be used by build systems. Today MlFront_Codept is used by DkCoder (which itself uses Dune as its primary build system), but there is nothing in MlFront_Codept that ties it to DkCoder or Dune. Without loss of generality I’ll be using Dune build files to show what MlFront_Codept can produce (with some simplifications for readability):

<!-- file: src/AcmeWidgets_Db/AcmeWidgets_Db.ml -->

let Embedded = AcmeWidgets_Db__Embedded

<!-- file: src/AcmeWidgets_Db/dune -->

(library (name AcmeWidgets_DbO__) (modules open__))
(library (name AcmeWidgets_Db__Embedded)
    (ocamlc_flags -alert @need_alternate_stdlib
        -open Stdlib414Shadow -open AcmeWidgets_DbO__)
    (modules Embedded))

<!-- file: src/AcmeWidgets_Cli/dune -->

(library (name AcmeWidgets_DbO__) (modules open__))
(library (name AcmeWidgets_Cli__Main)
    (ocamlc_flags -alert @need_alternate_stdlib
        -open Stdlib414Shadow)
    (modules Cli)
    (libraries AcmeWidgets_Db__Embedded))

<!-- file: src/MmotlSqlite3_Std/dune -->

(library
    (name MmotlSqlite3_Std__Sqlite3)
    (libraries sqlite3))

I don’t assume you understand all of that. But I do want you to see the places where:

  1. MlFront can give fully qualified names to existing libraries without touching that library. sqlite3 was given the name MmotlSqlite3_Std__Sqlite3. No name conflicts.
  2. MlFront is using the -alert feature of OCaml to stop direct use of the Standard Library. You still have access to the Standard Library, but you have to explicitly import its modules in your new code.
  3. MlFront controls module visibility through the -open OCaml feature. You can think of that as inserting code at the top of each module to control the behavior of that module.

To make MlFront-based tooling simple to use, MlFront provides enough information that your build system can provide informed recommendations or even auto-correct (-fix) your project.

Here is one usability example that could have an auto -fix:

Error (alert need_alternate_stdlib): module Stdlib414Shadow.Arg

This is part of the standard library distributed with OCaml.
You need to consistently qualify every piece of external code your
project uses, including the standard library.

The recommendation is to place

    open Tr1Stdlib_V414CRuntime

at the top of your script -OR- place

    module Arg = Tr1Stdlib_V414CRuntime.Arg

in your library's `open__.ml` -OR- directly use

    Tr1Stdlib_V414CRuntime.Arg

instead.

and here is another usability example showing the end-user how the module system failed:

Problem
-------
The module [AcmeWidgets_Std.XyzPingHandler] is not present.

Underlying Error
----------------
−Non-resolved external dependency.
    The following compilation units {/src/AcmeWidgets_Std/CurlIo.ml}
    depend on the unknown module "XyzPingHandler"

−Non-resolved internal dependency.
    The following compilation units {/src/AcmeWidgets_Std/ProvAwsSes.ml,
    /src/AcmeWidgets_Std/Subscriptions.ml}
    depend on the compilation units "/src/AcmeWidgets_Std/CurlIo.ml" that could not be resolved.

Context
-------
The module references are:

(unit AcmeWidgets_Std.CurlIo)
    - called from "/src/AcmeWidgets_Std/ProvAwsSes.ml"
-> (unit AcmeWidgets_Std.ProvAwsSes)
        - called from "/src/AcmeWidgets_Std/Subscriptions.ml"
-> (unit AcmeWidgets_Std.Subscriptions) - <analysis start>
-> (unit AcmeWidgets_Std.Subscriptions) - <entry>

Suggestion
----------
1. Check for typos.
2. Don't use the module.
3. Create the module as a new 'XyzPingHandler.ml' file.

Vision, expectations and all things meta

I’m not releasing MlFront out of the goodness of my heart. I am doing this because I believe MlFront can become a fundamental security “primitive” for identifying a program’s entry points and side-effects. And I hope we in the software industry make all security primitives accessible to anyone, anywhere, at anytime … because the people they protect matter. So I’m quite happy when djb releases his huge volume of security primitives into the public domain, and makes it accessible in the relatively easy-to-use NaCL library. I’m happy when my former employer Amazon makes their big-number library s2n-bignum (a security primitive for cryptography) accessible to anyone under permissive licenses. And I’m quite dismayed when GNU releases their big-number library GMP with an introduction that “set[s] firm restrictions on the use with non-free programs”. wat. We can do better; I’m releasing MlFront.

To set expectations correctly, please be aware that:

  • I’m dogfooding MlFront in existing products. That means I will be conservative, see what works and what doesn’t, and may work in changes over a few years. But the first year I don’t mind breaking the MlFront API.
  • I’m not looking for wholesale imports of some other favorite programming language’s package system. There needs to be a coherent design theme, and for me that means consistently building on top of OCaml strengths in an manner accessible to beginning programmers.

This work is not funded. If you’d like to help, you can:

  • Reduce my work by writing your own PRs. In particular, the adjacent project codept needs attention (making it buildable, supporting OCaml 5, writing docs).
  • Evangelize by sharing articles on your favorite sites or writing/creating your own article or video.
  • Monetarily fund the work through a development contract with my company or ask OCSF (I haven’t talked with them yet). For small-dollar amounts the most efficient way to contribute today is to subscribe to DkCoder (the product MlFront was spun out of).

Community Links: