Decider of representations of types and values in OCaml that work at scale. For example, Stripe's OpenAPI spec was 5.9 MB as of mid-2024 and its schemas were highly mutually recursive.

In OCaml that led to:

Since the behavior can vary based on the OCaml compiler version (and the presence of Flambda, etc.), the choice is hidden behind create.

Sourcetype repr_sumtype =
  1. | RegularVariants
    (*

    The representation using the regular variant (ex. type pet = | Cat | Dog). Regular variants are limited to 246 variants.

    Mitigation. None. Do not use if there are more than 246 variants.

    *)
  2. | ExtensibleVariants
    (*

    The representation using extensible variants (ex. type pet = .. type pet += | Cat | Dog). Extensible variants are mostly unlimited except when the definition and the access are within the same module file. Confer: ocamlopt.opt 5.2.0 fails to compile very large file but ocamlc compiles fine.

    Confer: https://stackoverflow.com/questions/54730373/when-should-extensible-variant-types-be-used-in-ocaml for the memory representation.

    Mitigation. Avoid the scaling limits by placing the definitions of the extensible variants in a separate module from modules that access those modules.

    *)
  3. | PolymorphicVariants
    (*

    The representation using polymorphic variants (ex. type pet = [`Cat | `Dog]). Polymorphic variants do not have to be pre-declared. They are mostly unlimited except when the hashes of the variant names collide. An example is squigglier and premiums. The collision probability is the classic birthday problem. With 2^31 possible hash variants in caml_hash_variant, 54,563 variants would have a 50% probability of collision (assuming a strong hash, which caml_hash_variant is not).

    Mitigation. Change the variant names if there is a collision.

    *)

The representation of variants in a sum type in generated source code.

Sourcetype t = {
  1. repr_sumtype : repr_sumtype;
}
Sourceval create : file_arity:[ `OneFile | `ManyFiles of string ] -> num_variants:int -> unit -> t
Sourceval variantize : t -> ocaml_id:string -> string

variantize t ~ocaml_id creates a name of a variant specific to the OCaml identifier ocaml_id. Polymorphic variants (PolymorphicVariants) will be `Id and the other variants will be Id, assuming that ocaml_id = "Id".

It is the responsibility of the caller to ensure that ocaml_id is a valid OCaml identifier. However, the ocaml_id does not need to be a valid (ie. capitalized) variant name.

Sourceval require_separated_modules : t -> bool

require_separated_modules t is true if creating a single global module would create problems with the OCaml compiler.

For example, extensible variants behave differently (and more scalably) when their definition is in a seperate module from their accessing modules.

Confer: ocamlopt.opt 5.2.0 fails to compile very large file but ocamlc compiles fine.

include DkCoder_Std.SCRIPT
Sourceval __init : DkCoder_Std.Context.t -> unit

__init context is the entry point for running a script module. The DkCoder compiler will inject this function at the top and bottom of the script module. The top __init does nothing, while the bottom __init calls the prior __init.

That means:

  1. calling the __init function guarantees that the script module is initialized; that is, all of the script module's side-effects (ex. let () = Format.printf "Hello world@.") are executed before the __init returns to the caller.
  2. you can override the __init function by simply defining the __init idempotently. That will shadow the top __init and when the bottom __init is executed your __init will be called instead of the do-nothing top __init.

Future versions of DkCoder will call __init in dependency order for all `You script modules. Your __init function may be called several times.

Sourceval __repl : DkCoder_Std.Context.t -> unit

__repl context is the entry point for debugging a script module in a REPL. The DkCoder compiler will inject this function at the top and bottom of the script module. The top __repl does nothing, while the bottom __repl calls the prior __repl.

That means:

  1. you can override the __repl function by simply defining the __repl idempotently. That will shadow the top __repl and when the bottom __repl is executed your __repl will be called instead of the do-nothing top __repl.
Sourceval __module_info : unit -> DkCoder_Std.ModuleInfo.t

The run-time module information for the script module.