A complete, native Object Pascal framework for building GPT-style transformer models from scratch. No PyTorch, no TensorFlow — just compiled Pascal code, matrix operations, and CUDA kernels.
NVIDIA CUDA with hybrid precision (FP64 CPU, FP32 GPU), automatic dispatch, weight caching, and pooled memory management.
AdamW optimizer, gradient accumulation, micro-batching, early stopping, and GPU auto batch-size detection.
Custom Byte Pair Encoding tokenizer with SQLite-backed vocabulary and Hugging Face compatible export.
Pandas-like columnar data structure for structured ML — CSV loading, normalization, shuffling, direct GPU batching.
SafeTensors format, Hugging Face compatible exports, and JSON-based configuration.
Greedy, temperature, top-k, and top-p (nucleus) sampling strategies for text generation.
Open demos/trainer/pascalgpt.trainer.dpr in Delphi 12+. Select Windows 64-bit target and build in Release mode.
Copy a preset from PresetTraining/ or create your own config.json with model and training parameters.
pascalgpt.trainer.exe -c config.json dataset.txt
var
Model: TGPTModel;
Trainer: TGPTrainer;
Tokens: TArray<Integer>;
begin
// Build a GPT model
Model := TGPTModel.Create(Config);
Model.InitializeWeights;
// Train with GPU acceleration
Trainer := TGPTrainer.Create(Model, Dataset, Optimizer);
Trainer.UseGPU := True;
Trainer.Train(Epochs);
// Generate text
Tokens := Model.Generate(PromptTokens, MaxTokens);
Text := Tokenizer.Decode(Tokens);
end;
Project vision, core capabilities, and platform support.
Read →Full unit descriptions, configuration options, and training modes.
Read →Command-line usage, build instructions, and training workflow.
Read →Deep technical reference, internals, and advanced topics.
Read →GPU kernels, memory management, and performance tuning.
Read →End-to-end walkthrough of the full pipeline.
Read →| Objective | Description | Exports |
|---|---|---|
FullModel |
Complete pipeline with full model training | safetensors, config.json, vocab |
TokenizerOnly |
Build vocabulary for external tools | vocab.json, merges.txt |
EmbeddingsOnly |
Generate raw token embeddings | embeddings.npy, vocab |
SentenceEmbeddings |
Train semantic search model (contrastive learning) | safetensors, config.json, vocab |
FineTune |
Start from pretrained weights | safetensors, config.json, vocab |