MlFront - Java-like packages for OCaml. Part 2 - The Core

In this part I will be describing the core semantics of MlFront. Much of the core semantics is captured in the MlFront_Core API documentation, but this post will tie it all together.

Parties

My initial designs had modules being able to compile other modules (the logical equivalent of an eval statement in other languages). This was essential because I wanted DkCoder and now MlFront to have the ability to “bootstrap” (compile itself).

At the same time, I did not want to give that eval ability to end-user code, which would have made it difficult to analyse end-user code.

So I introduced a trust model in MlFront that distinguished between modules that were written by me (ie. any MlFront-based build system) and those written by end-users. In that trust model my trusted modules could have access to eval but the less-trusted modules of end-users would not have access.

And then from there I extended the trust model slightly and formalized it.

When MlFront analyses a project, each module in MlFront is categorized as belonging to one of three parties:

  1. The “You” party are you and your team, who write the primary modules in a project. By convention these “You” modules are organized under the src/ folder of a project.
  2. The “Us” party are the developers of the MlFront-based build system that is analysing the current project. These developers write trusted modules which have a higher-level of privilege than “You” modules. For example, in the DkCoder build system, the “Us” modules are given environment variables that describe how to compile source code (the equivalent of giving the eval permission). By convention these trusted “Us” modules are located in read-only installation folders of the MlFront-based build system.
  3. The “Them” party includes every other developer. These “Them” developers write modules which have a lower-level of privilege than “You” modules, and may even be completely untrusted. The convention is that these Them modules are downloaded by the MlFront-based build system.

Any MlFront-based build system is free to ignore the party conventions.

Libraries

One of the first technical challenges I had to overcome was how to distinguish relative module references from absolute module references.

For example, let’s pretend you are writing code in the following You module:

(* file: src/A/B/C.ml *)
let () = D.E.F.say_hello ()

Does D.E.F refer to what I’m calling a “relative reference”:

(* file: src/A/B/C/D/E/F.ml *)
let say_hello = Tr1Stdlib_V414CRuntime.Printf.printf "hello"

or an “absolute reference”:

(* file: src/D/E/F.ml *)
let say_hello = Tr1Stdlib_V414CRuntime.Printf.printf "hello"

I resolved that ambiguity by making the anchor (the “D” in D.E.F) have a different lexical structure than the non-anchors (the E.F in D.E.F).

I’ve called the anchor the library. We’ve already encountered the Tr1Stdlib_V414CRuntime library; that library is a sub-partition of the OCaml Standard Library that has all the functions that need a C99 runtime. It looks different compared to a regular module name.

A library name always has:

  • A capital starting letter, followed by
  • One lowercase letter, followed by
  • Zero or more digits or lowercase letters, followed by
  • Another capital letter, followed by
  • One or more digits or lowercase letters, followed by
  • An underscore (_), followed by
  • A capital letter, followed by
  • Zero or more letters, digits and underscores.

The rules are strict but using the mixed casing below with the underscore will always lead to a valid library name:

VendorProject_Unit
│     │      ││
│     │      │└ UPPERCASE
│     │      └ UNDERSCORE
│     └ UPPERCASE
└ UPPERCASE

Not only is an MlFront library the anchor of the naming convention, but the library is the entity which owns all the modules underneath it.

The module Tr1Stdlib_V414CRuntime.Printf is owned by the library Tr1Stdlib_V414CRuntime and the module AcmeWidgets_Std.Activities.Manufacturing is owned by the library AcmeWidgets_Std.

What is in a library name?

Let’s go back to the mixed casing form of a library name:

VendorProject_Unit
│     │      ││
│     │      │└ UPPERCASE
│     │      └ UNDERSCORE
│     └ UPPERCASE
└ UPPERCASE

The first part, the “vendor”, is the organziation or person who owns the library. We heard in the Origin Story of the original post that cohttp (and other packages like it) were able to simplify their package naming by prefixing each package with their name (cohttp). The vendor plays the same role in MlFront.

There are some reserved vendors:

  • Ml, which is reserved by Diskuv (my company) on behalf of the OCaml compiler, runtime and dependency analyzers.
  • Dk, which is reserved by Diskuv for DkCoder.
  • Tr1, Tr2, … Tr<Number>, which are reserved by Diskuv for “Technical Report” proposals that split up and extend the OCaml standard library. We saw the Tr1 vendor in the first post’s Tr1Stdlib_V414CRuntime library.

In the first post we also encountered a library MmotlSqlite3_Std. That had a personal vendor Mmotl which corresponded to that user’s GitHub username. Using a GitHub username is the convention for personal vendors because the GitHub username is a globally unique identifier for developers. Of course, not every developer has a GitHub username, so just pick a vendor name for yourself that won’t be chosen by any other developer.

VendorProject_Unit
│     │      ││
│     │      │└ UPPERCASE
│     │      └ UNDERSCORE
│     └ UPPERCASE
└ UPPERCASE

The second part of a library name is the “project”. By convention, when you create one or more MlFront libraries in a source repository, those libraries will share the “project” name. More simply, all the libraries in a source repository belong to the same project.

So if we did a clone of the MlFront source repository we would see the following abbreviated listing of files:

$ git clone https://gitlab.com/dkml/build-tools/MlFront.git
Cloning ...

$ tree MlFront -P '*.ml' -I ci/ -I .ci/ -I _build/ -I msys64/
MlFront
├── src
   ├── MlFront_Cli
   ├── CmiUtils.ml
   ├── ColorDetect.ml
   ├── GeneratedLoads.ml
   ├── Optslog.ml
   └── TerminalLogSetup.ml
   ├── MlFront_Codept
   ├── CodeptFiles.ml
   ├── CodeptLog.ml
   ├── CodeptOrd.ml
   ├── DepGraph.ml
   ├── Errors.ml
   ├── ModuleUnit.ml
   ├── ModuleUniverse.ml
   ├── NamespacedId.ml
   ├── Trace.ml
   └── UnitPp.ml
   └── MlFront_Core
       ├── LibraryId.ml
       ├── MlFront_Core.ml
       ├── ModuleAssumptions.ml
       ├── ModuleId.ml
       ├── SpecialModuleId.ml
       ├── Squish.ml
       ├── StandardModuleId.ml
       └── UnitId.ml
└── tests
    └── MlFront_Core

Notice how all the libraries MlFront_Cli, MlFront_Codept and MlFront_Core share the project name Front.

The .git stem of the source code repository URL is conventionally vendor (Ml) and the project (Front). So MlFront.git is a conventionally named stem for the repository URL https://gitlab.com/dkml/build-tools/MlFront.git. This convention is important because a git clone uses the stem of the repository as the name of the new directory created during a clone (aka. a checkout).

Many package managers (ex. opam, npm) have the concept of “overriding” a package for local development. By following the VendorProject naming convention for stems, you can have a set of projects checked out in one directory:

AcmeWidget/
  src/
    AcmeWidget_Std/

AcmeRobot/
  src/
    AcmeRobot_Std/

and the MlFront-based tooling may assume that the projects are all local overrides for each other.

VendorProject_Unit
│     │      ││
│     │      │└ UPPERCASE
│     │      └ UNDERSCORE
│     └ UPPERCASE
└ UPPERCASE

The final part of the library name is the library “unit”. The unit is what distinguishes one library from the next, and should be short and somewhat descriptive of the contents of the library.

By convention, the main unit in a project is named Std.

Arranging a library in a file system

Here we have a library DkSubscribeWebhook_Std located in the src/ folder of a project. Remember from our earlier discussion about parties that src/ is the customary location for the You party.

src
└── DkSubscribeWebhook_Std
    ├── Aws
    │   ├── Endpoints.ml
    │   └── Signing.ml
    ├── CliEmail.ml
    ├── CliTemplate.ml
    ├── CurlIo.ml
    ├── Errors.ml
    ├── Expiry.ml
    ├── PingHandler.ml
    ├── Prov1Password.ml
    ├── ProvAwsSes.ml
    ├── ProvGitLab.ml
    ├── ProvStripe.ml
    ├── Providers.ml
    ├── Subscriptions.ml
    ├── TemplateInvoicePaid.ml
    ├── TemplateSubscriptionCancelled.ml
    ├── TemplateSubscriptionDeleted.ml
    ├── TemplateSubscriptionPaused.ml
    ├── TemplateSubscriptionResumed.ml
    └── WebhookHandler.ml

What you see above are standard modules, where the hierarchy you see in the filesystem reflects how they are named in your source code:

DkSubscribeWebhook_Std
DkSubscribeWebhook_Std.CliEmail
DkSubscribeWebhook_Std.Expiry
DkSubscribeWebhook_Std.Aws
DkSubscribeWebhook_Std.Aws.Endpoints

The “open” module

There is one more category of modules that can be saved in the file system. These are called special modules. Unlike standard modules, special modules cannot be referenced in your source code.

Today there is only type of special module1: the open module. It appears on the file system as the file open__.ml:

src
└── DkSubscribeWebhook_Std
    ├── open__.ml <------- The "open" module
    ├── Aws
    │   └── Signing.ml
    ├── TemplateSubscriptionResumed.ml
    └── WebhookHandler.ml

1Actually, there are two types of special module but the second type is deprecated.

The open module logically belongs to the library, and it can only be placed directly in the library directory rather in a subdirectory. In the example above, the open__.ml could not be in the Aws/ subdirectory.

We’ve seen that open module being used in the first post to introduce an alias used in all the modules of the library:

(* file: src/AcmeWidgets_Db/open__.ml *)

module Sqlite3 = MmotlSqlite3_Std.Sqlite3

Wrapping up the filesystem

The standard module and the special module are instances of a module unit. Any module file that ends with .ml or .mli is a module unit.

Referencing modules in source code

All the items below contain valid module references except the one line that is commented out:

module A = DkSubscribeWebhook_Std
module C = DkSubscribeWebhook_Std.Expiry
module D = DkSubscribeWebhook_Std.Aws
module E = DkSubscribeWebhook_Std.Aws.Endpoints

(* let () = DkSubscribeWebhook_Std.cannot_do_this () *)
let () = DkSubscribeWebhook_Std.Expiry.print_tomorrow ()
let () = DkSubscribeWebhook_Std.Aws.print_services ()
let () = DkSubscribeWebhook_Std.Aws.Endpoints.print_region ()

The DkSubscribeWebhook_Std.cannot_do_this () is not allowed because the library module only contain submodules.

This makes sense because you, as an end-user, never created a DkSubscribeWebhook_Std.ml file. It was automatically generated by the MlFront-based build system as we say in the first post.

But for now, let’s recap what we have seen today:

  1. Modules you write as files are called module units. And there are two types of units: the standard module and the special module. What makes the special module “special” is that you can’t reference it in your source code.

All the modules you can reference in your source code are called packages. We’ve already encountered two types of packages:

  1. The standard module can be referenced in source code. Examples: DkSubscribeWebhook_Std.Expiry, DkSubscribeWebhook_Std.Aws, DkSubscribeWebhook_Std.Aws and DkSubscribeWebhook_Std.Aws.Endpoints.
  2. The library module can be referenced in source code, although the only values it contains are other (standard) modules. Example: DkSubscribeWebhook_Std.

There are more types of packages which we’ll encounter in the next post.

Summary

Here is a Venn diagram for how the different types of modules are identified by MlFront:

┌──────────────| UNIT ID |───────────────┐
│                                        │
|                                        │
|  ┌──────| MODULE ID |───────────┐      |
|  │                              │      |
|  │  DkEx_Std/open__.ml          │      |
|  │     library_id: DkEx_Std     │      |
|  │     state: Special           │      |
|  │                              │      |
|  │  ┌───────────────────────────┼──┐   |
|  │  │                           │  │   |
|  │  │  DkEx_Std/One.ml          │  │   |
|  |  │     library_id: DkEx_Std  |  │   |
|  │  |     state: Standard       │  |   |
|  │  │                           │  │   |
|  │  │  DkEx_Std/Sub/Two.ml      │  │   |
|  |  │     library_id: DkEx_Std  |  │   |
|  │  |     state: Standard       │  |   |
|  │  │                           │  │   |
|  └──┼───────────────────────────┘  │   |
|     │                              │   |
|     │    DkEx_Std.ml               │   |
|     │       library_id: DkEx_Std   │   |
|     |     state: Library           |   |
|     │                              │   |
|     └─────| PACKAGE ID |───────────┘   |
|                                        |
└────────────────────────────────────────┘

The inner-most box (DkEx_Std/One.ml and DkEx_Std/Sub/Two.ml) are standard modules; those are the modules you will write most often.

There are modules you write (aka. “units”) that can’t be referenced in code: the “special” modules like DkEx_Std/open__.ml.

And there are modules you can reference (aka. “packages”) that you can’t write at all. They are autogenerated: the “library” modules like DkEx_Std.ml and some more you’ll encounter next post.


Community Links: