Native Delphi / Object Pascal
Dolphin Framework

Dolphin
The Pascal GPT Framework

A complete, native Object Pascal framework for building GPT-style transformer models from scratch. No PyTorch, no TensorFlow — just compiled Pascal code, matrix operations, and CUDA kernels.

Delphi 12+ | CUDA Accelerated | MIT License
Text
BPE Tokenizer
Embeddings
Transformer
Logits
Generation

Core Capabilities

GPU Acceleration

NVIDIA CUDA with hybrid precision (FP64 CPU, FP32 GPU), automatic dispatch, weight caching, and pooled memory management.

📏

Full Training Pipeline

AdamW optimizer, gradient accumulation, micro-batching, early stopping, and GPU auto batch-size detection.

📜

BPE Tokenization

Custom Byte Pair Encoding tokenizer with SQLite-backed vocabulary and Hugging Face compatible export.

📊

TDataFrame

Pandas-like columnar data structure for structured ML — CSV loading, normalization, shuffling, direct GPU batching.

🔄

Interoperability

SafeTensors format, Hugging Face compatible exports, and JSON-based configuration.

Advanced Generation

Greedy, temperature, top-k, and top-p (nucleus) sampling strategies for text generation.

Quick Start

1

Open in Delphi

Open demos/trainer/pascalgpt.trainer.dpr in Delphi 12+. Select Windows 64-bit target and build in Release mode.

2

Prepare Config

Copy a preset from PresetTraining/ or create your own config.json with model and training parameters.

3

Train

pascalgpt.trainer.exe -c config.json dataset.txt

Pascal API Example

var
  Model: TGPTModel;
  Trainer: TGPTrainer;
  Tokens: TArray<Integer>;
begin
  // Build a GPT model
  Model := TGPTModel.Create(Config);
  Model.InitializeWeights;

  // Train with GPU acceleration
  Trainer := TGPTrainer.Create(Model, Dataset, Optimizer);
  Trainer.UseGPU := True;
  Trainer.Train(Epochs);

  // Generate text
  Tokens := Model.Generate(PromptTokens, MaxTokens);
  Text := Tokenizer.Decode(Tokens);
end;

Architecture

Input Text
BPE Tokenizer
Embedding Layer + RoPE
Transformer Block (xN)
LayerNorm Multi-Head Attention Residual LayerNorm Feed-Forward (GELU) Residual
Final LayerNorm → Linear Head → Logits
Softmax → Sampling → Generated Text

Documentation

Training Objectives

Objective Description Exports
FullModel Complete pipeline with full model training safetensors, config.json, vocab
TokenizerOnly Build vocabulary for external tools vocab.json, merges.txt
EmbeddingsOnly Generate raw token embeddings embeddings.npy, vocab
SentenceEmbeddings Train semantic search model (contrastive learning) safetensors, config.json, vocab
FineTune Start from pretrained weights safetensors, config.json, vocab