As I was peeling MlFront out of
DkCoder, I realized that I could not
transfer most of my integration tests; those integration tests relied on a fully
functioning DkCoder system.
Out of the practical need for integration tests, I built the MlFront_Boot
build system. In a later post I will describe how you can use MlFront_Boot to
do a security analysis of your source code. But for now, let’s see how
MlFront_Boot works because that will help you understand what the security
analysis will accomplish. And for those readers who implement their own build
systems … you can treat MlFront_Boot as the MlFront reference build system
which can be copied and mimicked.
Here is a minimal MlFront_Boot project. You only need source code arranged in
the MlFront package structure:
.
├── AcmeWidgets_Std/
│ └── A.ml
└── BobBuilder_Std/
└── B.ml
(* AcmeWidgets_Std/A.ml *)
let print_self () = print_endline "I am an A!"
(* BobBuilder_Std/B.ml *)
let print_self () = print_endline "I am a B!"
let () =
AcmeWidgets_Std.A.print_self ();
print_self ()
You can already see that MlFront_Boot supports cross-package references from
BobBuilder_Std/B.ml to AcmeWidgets_Std/A.ml.
Sidebar: It’s early! At the time of the post,
MlFront_Bootlacks the implementation code to supportAcmeWidgets_Std.Some.Sub.Packagedeeply nested subpackages, and has a few other gaps. I’ll edit this post and remove this sidebar once sufficient code for security analysis has been ported from DkCoder.
The MlFront_Boot build system is run using the executable mlfront-boot.
mlfront-boot can be built from source, but prebuilt binaries are distributed
at https://gitlab.com/dkml/build-tools/MlFront/-/releases.
When run, mlfront-boot will analyze the source code:
mlfront-boot -o buildscript
and generate a Windows batch script and a POSIX (macOS/Linux) shell script:
.
├── AcmeWidgets_Std/
├── BobBuilder_Std/
├── buildscript.cmd <-- generated
└── buildscript.sh <-- generated
Those generated scripts are simple to audit, fully self-contained and can be checked into source control.
When you run the script (buildscript.cmd on Windows or buildscript.sh on
macOS/Linux), your only responsibility is to ensure that ocamlc is available
in your PATH. Running the script will build executables for your project:
directory create: target/
file create: target/AcmeWidgets_Std.ml
link create: AcmeWidgets_Std/A.ml -> target/AcmeWidgets_Std__A.ml
compile: AcmeWidgets_Std.A
compile: AcmeWidgets_Std
link create: BobBuilder_Std/B.ml -> target/BobBuilder_Std__B.ml
compile: BobBuilder_Std.B
executable create: BobBuilder_Std.B
You can now run the bytecode executable:
$ ocamlrun target/BobBuilder_Std.B.bc
I am an A!
I am a B!
If you had used the -native option at boot time, you could run the native
executable:
$ mlfront-boot -native -o buildscript
$ ./buildscript.sh # or .\buildscript.cmd on Windows
$ target/BobBuilder_Std.B
I am an A!
I am a B!
How does it work?
MlFront_Boot moves through these steps:
-
Scan the directories for source code. Each module in your project will be fully-qualified. For example, the module
AcmeWidgets_Std/A.mlbecametarget/AcmeWidgets_Std__A.mlto avoid any module collisions. This transformation is in memory and nothing has been written to disk.Advanced. At the time of this post, there is a limited version of a namespaces specification used by
MlFront_Boot. Only thescanand themerge strictexpressions are used. -
Convert each compilation “unit” (from the second post) into a simplified module structure called the “Module Meta Language” (M2l). That conversion is performed by the
codeptdependency analysis tool. It strips away the types and values and just leaves what modules are used.For example, these two files:
(* AcmeWidgets_Std/First.ml *) module A = struct module Inner = struct let f x = x end endand
(* AcmeWidgets_Std/Second.ml *) let x = 1 open First.A module B = struct let y = Inner.f x endare converted to the following in-memory:
(* AcmeWidgets_Std/First.m2l *) module A = struct module Inner = struct end endand
(* AcmeWidgets_Std/Second.m2l *) open First.A module B = struct [%access {Inner}] end -
All the M2l units are given to a
codeptsolver, with the name of each unit being the fully-qualified name from step 1.When
codeptevaluatesAcmeWidgets_Std/First.m2l, it populates an in-memory “resolved” environment with the following modules:module AcmeWidgets_Std__First = struct module A = struct module Inner = struct end end endThen when
codeptevaluatesAcmeWidgets_Std/Second.m2lit fails to resolve both theFirstandInnermodule references:(* AcmeWidgets_Std/Second.m2l *) (* codept does not know the [First] module. The correct module is [AcmeWidgets_Std__First]! *) open First.A (* codept does not know [Inner]. The correct module is [AcmeWidgets_Std__First.A.Inner]. *) module B = struct [%access {Inner}] endAt this point
codeptfails and informsMlFrontthatAcmeWidgets_Std__Secondis missing a module reference to “A” and “Inner”. -
MlFrontknows that both “A” and “Inner” are relative module references because they do not follow the patternVendorProject_Unit │ │ ││ │ │ │└ UPPERCASE │ │ └ UNDERSCORE │ └ UPPERCASE └ UPPERCASEwe saw last post.
That means
MlFrontwill askMlFront_Bootthrough itsrespond_to_missing_modulefunction if it can find the modulesAcmeWidgets_Std__FirstandAcmeWidgets_Std__Inner. -
MlFront_Bootalready knew from step 1 thatAcmeWidgets_Std__Firstwas available, and responds toMlFrontto addAcmeWidgets_Std__Firstas theFirstalias.Now
AcmeWidgets_Std__Secondlooks like:(* AcmeWidgets_Std/Second.m2l *) module First = AcmeWidgets_Std__First open First.A module B = struct [%access {Inner}] end -
MlFrontgives the slightly tweakedSecond.m2lback tocodept, which can fully resolve all of the references.
In general, MlFront looks at each .m2l in-memory file, and then asks the
build system (ex. MlFront_Boot) to respond with the locations of missing
modules for missing relative module references. MlFront will also ask the
build system to respond with the locations of missing libraries for missing
absolute module references like BobBuilder_Std. Any response results in a
slight modification to the .m2l in-memory file. MlFront will keep rerunning
codept runs until there are no more missing module references.
After there is a full picture of the project, the build system can write out its
build scripts (ex. buildscript.sh and buildscript.cmd for MlFront_Boot).
The build system is responsible for replicating the .m2l modifications (ex. we
added the module First = AcmeWidgets_Std__First alias earlier) in its build
scripts. The build system can use -open SomeModification in its ocamlc /
ocamlopt flags to accomplish those modifications.
Build systems can also synthesize their own modules at any point of the cycle.
MlFront_Boot does not use this functionality very much but DkCoder uses it
heavily.
-
Implicit modules: Modules created automatically after the analysis of a single unit.
For example, with DkCoder, when the following unit is scanned:
let () = Printf.printf "All assets are in %s\n" (Tr1Assets.LocalDir.v ())When
MlFronttells DkCoder that it must respond to the missingTr1Assetsmodule, DkCoder creates theTr1Assetsmodule automatically from the contents of an assets folder (images, audio files and other static resources). -
Optimistic modules: Modules created automatically in response to many units being scanned.
For example, with DkCoder, when units are scanned in a package directory (ex.
AcmeWidgets_Std/Something/*.ml), units are scanned in the subpackages (ex.AcmeWidgets_Std/Something/Deeper/*.ml) where “subpackage modules” are created automatically. That was a mouthful! Basically,AcmeWidgets_Std.Something.Acan only accessAcmeWidgets_Std.Something.Deeper.Bif someone created the subpackage moduleAcmeWidgets_Std.Something.Deeper. An optimistic module has holistic access to many modules which enables those subpackage modules to be generated on your behalf.
In summary, MlFront_Boot, through unique identification of modules and
codept-based analysis, is a build system generator that writes portable build
shell scripts backed by an accurate enumeration of modules used in a program.
You need some more equipment to do the security analysis we mentioned in the
first post, but today you learned how the main equipment works.

