Native Delphi / Object Pascal

Dolphin
The Pascal GPT Framework

A complete, native Object Pascal framework for building GPT-style transformer models from scratch. No PyTorch, no TensorFlow — just compiled Pascal code, matrix operations, and CUDA kernels.

Get Started Read the Docs

Delphi 12+ | CUDA Accelerated | MIT License

Text

→

BPE Tokenizer

→

Embeddings

→

Transformer

→

Logits

→

Generation

Core Capabilities

⚡

GPU Acceleration

NVIDIA CUDA with hybrid precision (FP64 CPU, FP32 GPU), automatic dispatch, weight caching, and pooled memory management.

📏

Full Training Pipeline

AdamW optimizer, gradient accumulation, micro-batching, early stopping, and GPU auto batch-size detection.

📜

BPE Tokenization

Custom Byte Pair Encoding tokenizer with SQLite-backed vocabulary and Hugging Face compatible export.

📊

TDataFrame

Pandas-like columnar data structure for structured ML — CSV loading, normalization, shuffling, direct GPU batching.

🔄

Interoperability

SafeTensors format, Hugging Face compatible exports, and JSON-based configuration.

⚙

Advanced Generation

Greedy, temperature, top-k, and top-p (nucleus) sampling strategies for text generation.

Quick Start

Open in Delphi

Open demos/trainer/pascalgpt.trainer.dpr in Delphi 12+. Select Windows 64-bit target and build in Release mode.

Prepare Config

Copy a preset from PresetTraining/ or create your own config.json with model and training parameters.

Train

pascalgpt.trainer.exe -c config.json dataset.txt

Pascal API Example
                    
                    var
  Model: TGPTModel;
  Trainer: TGPTrainer;
  Tokens: TArray<Integer>;
begin
  // Build a GPT model
  Model := TGPTModel.Create(Config);
  Model.InitializeWeights;

  // Train with GPU acceleration
  Trainer := TGPTrainer.Create(Model, Dataset, Optimizer);
  Trainer.UseGPU := True;
  Trainer.Train(Epochs);

  // Generate text
  Tokens := Model.Generate(PromptTokens, MaxTokens);
  Text := Tokenizer.Decode(Tokens);
end;
                

Architecture

Input Text

↓

BPE Tokenizer

↓

Embedding Layer + RoPE

↓

Transformer Block (xN)

LayerNorm Multi-Head Attention Residual LayerNorm Feed-Forward (GELU) Residual

↓

Final LayerNorm → Linear Head → Logits

↓

Softmax → Sampling → Generated Text

Training Objectives

Objective	Description	Exports
`FullModel`	Complete pipeline with full model training	safetensors, config.json, vocab
`TokenizerOnly`	Build vocabulary for external tools	vocab.json, merges.txt
`EmbeddingsOnly`	Generate raw token embeddings	embeddings.npy, vocab
`SentenceEmbeddings`	Train semantic search model (contrastive learning)	safetensors, config.json, vocab
`FineTune`	Start from pretrained weights	safetensors, config.json, vocab

Dolphin
The Pascal GPT Framework

Core Capabilities

GPU Acceleration

Full Training Pipeline

BPE Tokenization

TDataFrame

Interoperability

Advanced Generation

Quick Start

Open in Delphi

Prepare Config

Train

Pascal API Example

Architecture

Documentation

Introduction

API Reference

Trainer Manual

Wiki

CUDA Reference

How It Works

Training Objectives