As I was peeling MlFront
out of
DkCoder, I realized that I could not
transfer most of my integration tests; those integration tests relied on a fully
functioning DkCoder system.
Out of the practical need for integration tests, I built the MlFront_Boot
build system. In a later post I will describe how you can use MlFront_Boot
to
do a security analysis of your source code. But for now, let’s see how
MlFront_Boot
works because that will help you understand what the security
analysis will accomplish. And for those readers who implement their own build
systems … you can treat MlFront_Boot
as the MlFront
reference build system
which can be copied and mimicked.
Here is a minimal MlFront_Boot
project. You only need source code arranged in
the MlFront
package structure:
.
├── AcmeWidgets_Std/
│ └── A.ml
└── BobBuilder_Std/
└── B.ml
(* AcmeWidgets_Std/A.ml *)
let print_self () = print_endline "I am an A!"
(* BobBuilder_Std/B.ml *)
let print_self () = print_endline "I am a B!"
let () =
AcmeWidgets_Std.A.print_self ();
print_self ()
You can already see that MlFront_Boot
supports cross-package references from
BobBuilder_Std/B.ml
to AcmeWidgets_Std/A.ml
.
Sidebar: It’s early! At the time of the post,
MlFront_Boot
lacks the implementation code to supportAcmeWidgets_Std.Some.Sub.Package
deeply nested subpackages, and has a few other gaps. I’ll edit this post and remove this sidebar once sufficient code for security analysis has been ported from DkCoder.
The MlFront_Boot
build system is run using the executable mlfront-boot
.
mlfront-boot
can be built from source, but prebuilt binaries are distributed
at https://gitlab.com/dkml/build-tools/MlFront/-/releases.
When run, mlfront-boot
will analyze the source code:
mlfront-boot -o buildscript
and generate a Windows batch script and a POSIX (macOS/Linux) shell script:
.
├── AcmeWidgets_Std/
├── BobBuilder_Std/
├── buildscript.cmd <-- generated
└── buildscript.sh <-- generated
Those generated scripts are simple to audit, fully self-contained and can be checked into source control.
When you run the script (buildscript.cmd
on Windows or buildscript.sh
on
macOS/Linux), your only responsibility is to ensure that ocamlc
is available
in your PATH. Running the script will build executables for your project:
directory create: target/
file create: target/AcmeWidgets_Std.ml
link create: AcmeWidgets_Std/A.ml -> target/AcmeWidgets_Std__A.ml
compile: AcmeWidgets_Std.A
compile: AcmeWidgets_Std
link create: BobBuilder_Std/B.ml -> target/BobBuilder_Std__B.ml
compile: BobBuilder_Std.B
executable create: BobBuilder_Std.B
You can now run the bytecode executable:
$ ocamlrun target/BobBuilder_Std.B.bc
I am an A!
I am a B!
If you had used the -native
option at boot time, you could run the native
executable:
$ mlfront-boot -native -o buildscript
$ ./buildscript.sh # or .\buildscript.cmd on Windows
$ target/BobBuilder_Std.B
I am an A!
I am a B!
How does it work?
MlFront_Boot
moves through these steps:
-
Scan the directories for source code. Each module in your project will be fully-qualified. For example, the module
AcmeWidgets_Std/A.ml
becametarget/AcmeWidgets_Std__A.ml
to avoid any module collisions. This transformation is in memory and nothing has been written to disk.Advanced. At the time of this post, there is a limited version of a namespaces specification used by
MlFront_Boot
. Only thescan
and themerge strict
expressions are used. -
Convert each compilation “unit” (from the second post) into a simplified module structure called the “Module Meta Language” (M2l). That conversion is performed by the
codept
dependency analysis tool. It strips away the types and values and just leaves what modules are used.For example, these two files:
(* AcmeWidgets_Std/First.ml *) module A = struct module Inner = struct let f x = x end end
and
(* AcmeWidgets_Std/Second.ml *) let x = 1 open First.A module B = struct let y = Inner.f x end
are converted to the following in-memory:
(* AcmeWidgets_Std/First.m2l *) module A = struct module Inner = struct end end
and
(* AcmeWidgets_Std/Second.m2l *) open First.A module B = struct [%access {Inner}] end
-
All the M2l units are given to a
codept
solver, with the name of each unit being the fully-qualified name from step 1.When
codept
evaluatesAcmeWidgets_Std/First.m2l
, it populates an in-memory “resolved” environment with the following modules:module AcmeWidgets_Std__First = struct module A = struct module Inner = struct end end end
Then when
codept
evaluatesAcmeWidgets_Std/Second.m2l
it fails to resolve both theFirst
andInner
module references:(* AcmeWidgets_Std/Second.m2l *) (* codept does not know the [First] module. The correct module is [AcmeWidgets_Std__First]! *) open First.A (* codept does not know [Inner]. The correct module is [AcmeWidgets_Std__First.A.Inner]. *) module B = struct [%access {Inner}] end
At this point
codept
fails and informsMlFront
thatAcmeWidgets_Std__Second
is missing a module reference to “A” and “Inner”. -
MlFront
knows that both “A” and “Inner” are relative module references because they do not follow the patternVendorProject_Unit │ │ ││ │ │ │└ UPPERCASE │ │ └ UNDERSCORE │ └ UPPERCASE └ UPPERCASE
we saw last post.
That means
MlFront
will askMlFront_Boot
through itsrespond_to_missing_module
function if it can find the modulesAcmeWidgets_Std__First
andAcmeWidgets_Std__Inner
. -
MlFront_Boot
already knew from step 1 thatAcmeWidgets_Std__First
was available, and responds toMlFront
to addAcmeWidgets_Std__First
as theFirst
alias.Now
AcmeWidgets_Std__Second
looks like:(* AcmeWidgets_Std/Second.m2l *) module First = AcmeWidgets_Std__First open First.A module B = struct [%access {Inner}] end
-
MlFront
gives the slightly tweakedSecond.m2l
back tocodept
, which can fully resolve all of the references.
In general, MlFront
looks at each .m2l
in-memory file, and then asks the
build system (ex. MlFront_Boot
) to respond with the locations of missing
modules for missing relative module references. MlFront
will also ask the
build system to respond with the locations of missing libraries for missing
absolute module references like BobBuilder_Std
. Any response results in a
slight modification to the .m2l
in-memory file. MlFront
will keep rerunning
codept
runs until there are no more missing module references.
After there is a full picture of the project, the build system can write out its
build scripts (ex. buildscript.sh
and buildscript.cmd
for MlFront_Boot
).
The build system is responsible for replicating the .m2l
modifications (ex. we
added the module First = AcmeWidgets_Std__First
alias earlier) in its build
scripts. The build system can use -open SomeModification
in its ocamlc
/
ocamlopt
flags to accomplish those modifications.
Build systems can also synthesize their own modules at any point of the cycle.
MlFront_Boot
does not use this functionality very much but DkCoder uses it
heavily.
-
Implicit modules: Modules created automatically after the analysis of a single unit.
For example, with DkCoder, when the following unit is scanned:
let () = Printf.printf "All assets are in %s\n" (Tr1Assets.LocalDir.v ())
When
MlFront
tells DkCoder that it must respond to the missingTr1Assets
module, DkCoder creates theTr1Assets
module automatically from the contents of an assets folder (images, audio files and other static resources). -
Optimistic modules: Modules created automatically in response to many units being scanned.
For example, with DkCoder, when units are scanned in a package directory (ex.
AcmeWidgets_Std/Something/*.ml
), units are scanned in the subpackages (ex.AcmeWidgets_Std/Something/Deeper/*.ml
) where “subpackage modules” are created automatically. That was a mouthful! Basically,AcmeWidgets_Std.Something.A
can only accessAcmeWidgets_Std.Something.Deeper.B
if someone created the subpackage moduleAcmeWidgets_Std.Something.Deeper
. An optimistic module has holistic access to many modules which enables those subpackage modules to be generated on your behalf.
In summary, MlFront_Boot
, through unique identification of modules and
codept
-based analysis, is a build system generator that writes portable build
shell scripts backed by an accurate enumeration of modules used in a program.
You need some more equipment to do the security analysis we mentioned in the
first post, but today you learned how the main equipment works.