Glossary
This glossary explains database and SQL terminology used in sql-splitter documentation. Terms are grouped by category for easier reference.
Keys, Constraints, and Integrity
Section titled “Keys, Constraints, and Integrity”Primary Key (PK)
Section titled “Primary Key (PK)”A column (or set of columns) whose values uniquely identify each row in a table. Primary keys cannot be NULL and are often referenced by foreign keys in other tables.
Foreign Key (FK)
Section titled “Foreign Key (FK)”A constraint that requires values in one table’s column(s) to match existing values in another table’s primary key. Foreign keys are the main mechanism for representing relationships between tables.
Constraint
Section titled “Constraint”A rule the database enforces on table data or structure. Common constraints include:
- PRIMARY KEY - Unique row identifier
- FOREIGN KEY - Reference to another table
- UNIQUE - No duplicate values allowed
- NOT NULL - Value required
- CHECK - Custom validation (e.g.,
age >= 0)
Referential Integrity
Section titled “Referential Integrity”The guarantee that relationships between tables remain consistent. For example, you can’t have an orders.user_id that references a users.id that doesn’t exist. Foreign keys enforce referential integrity.
CASCADE
Section titled “CASCADE”A referential action on a foreign key that automatically propagates changes. For example, ON DELETE CASCADE deletes child rows when the parent row is deleted. Similarly, ON UPDATE CASCADE updates child foreign key values when the parent key changes.
Deferred Constraints
Section titled “Deferred Constraints”Constraints that are checked at the end of a transaction instead of immediately after each statement. This allows temporarily “inconsistent” states while loading data, as long as everything is valid by commit time. Useful for loading tables with circular dependencies.
-- PostgreSQL syntaxSET CONSTRAINTS ALL DEFERRED;SQL Structure
Section titled “SQL Structure”Schema
Section titled “Schema”The organized definition of database objects—tables, columns, types, constraints, indexes, etc. In PostgreSQL, “schema” also refers to a namespace (like public), but in sql-splitter docs, “schema” typically means the overall database structure.
DDL (Data Definition Language)
Section titled “DDL (Data Definition Language)”SQL statements that define or change database structure:
CREATE TABLE,ALTER TABLE,DROP TABLECREATE INDEX,DROP INDEXCREATE VIEW
DDL describes what the database looks like.
DML (Data Manipulation Language)
Section titled “DML (Data Manipulation Language)”SQL statements that work with data:
INSERT,UPDATE,DELETECOPY ... FROM stdin(PostgreSQL bulk load)
DML describes what data is in the database.
Dialect-Specific Syntax
Section titled “Dialect-Specific Syntax”SQL Dialect
Section titled “SQL Dialect”The specific flavor of SQL supported by a database engine. sql-splitter supports four dialects:
| Dialect | Key Features |
|---|---|
| MySQL | Backtick quoting, AUTO_INCREMENT, ENGINE=InnoDB |
| PostgreSQL | Double-quote identifiers, SERIAL, COPY FROM stdin |
| SQLite | INTEGER PRIMARY KEY for auto-increment |
| MSSQL | IDENTITY, GO batch separators, NVARCHAR |
AUTO_INCREMENT (MySQL)
Section titled “AUTO_INCREMENT (MySQL)”A column attribute that automatically generates increasing numeric values. Commonly used for surrogate primary keys.
id INT AUTO_INCREMENT PRIMARY KEYSERIAL (PostgreSQL)
Section titled “SERIAL (PostgreSQL)”A convenience type that creates an integer column backed by an auto-incrementing sequence. Modern PostgreSQL prefers IDENTITY.
id SERIAL PRIMARY KEY-- Equivalent to: id INTEGER DEFAULT nextval('tablename_id_seq')IDENTITY (PostgreSQL/MSSQL)
Section titled “IDENTITY (PostgreSQL/MSSQL)”A standards-aligned way to define auto-generated numeric columns.
-- PostgreSQLid INT GENERATED BY DEFAULT AS IDENTITY
-- MSSQLid INT IDENTITY(1,1)COPY FROM stdin (PostgreSQL)
Section titled “COPY FROM stdin (PostgreSQL)”A PostgreSQL bulk-loading command that reads table data from the client connection. In pg_dump output, it appears as:
COPY users (id, name, email) FROM stdin;1 Alice alice@example.com2 Bob bob@example.com\.The \. marks the end of data. sql-splitter’s convert command transforms this to INSERT statements for other dialects.
Dependency Graphs and Ordering
Section titled “Dependency Graphs and Ordering”Dependency Graph
Section titled “Dependency Graph”A directed graph where nodes represent tables and edges represent foreign key relationships. Used to determine safe ordering for creating tables and loading data.
Topological Order / Topological Sort
Section titled “Topological Order / Topological Sort”An ordering of tables such that every dependency (parent table) appears before the table that depends on it (child table). In SQL dumps, this means parent tables are created and populated before child tables, satisfying foreign key constraints during import.
See order for reordering dumps topologically.
Circular Dependency / Cycle
Section titled “Circular Dependency / Cycle”A situation where following foreign key relationships leads back to the starting table. For example: orders → payments → invoices → orders. Cycles prevent simple topological ordering and require special handling:
- Defer constraints during import
- Disable FK checks temporarily
- Split data loading into phases
See graph --cycles-only to detect and visualize cycles.
Root Table
Section titled “Root Table”A table with no outbound foreign keys (it doesn’t reference other tables). Root tables can be loaded first since they have no dependencies.
Parent Table
Section titled “Parent Table”The referenced table in a foreign key relationship—the table that “owns” the primary key being referenced. Parent tables must generally exist before child tables can reference them.
Child Table
Section titled “Child Table”The referencing table in a foreign key relationship—it contains the foreign key column(s). Child rows depend on parent rows existing.
Data Processing
Section titled “Data Processing”Streaming
Section titled “Streaming”Processing a dump file incrementally as it’s read, rather than loading the entire file into memory.
Commands that stream (constant ~50MB memory):
split,merge,convert,analyze,redact
Commands that require more memory:
query- Loads data into DuckDB (use--diskfor large files)diff- Tracks PK hashes for comparison (use--schema-onlyto reduce)validate- Tracks PKs/FKs when checking integrity (use--no-fk-checksto skip)order,graph- Build dependency graph (memory scales with number of tables/FKs)sample,shard- Track selected rows when preserving FK relationships
See Memory & Streaming for detailed memory profiles per command.
Deterministic
Section titled “Deterministic”Given the same input and configuration, the tool produces identical output every run. Determinism is important for CI pipelines, debugging, and reproducible data workflows. Use --seed with sampling/redaction operations for deterministic “random” results.
A fixed value used to initialize a pseudo-random number generator. Using the same seed produces the same “random” results every time, making operations reproducible.
sql-splitter sample dump.sql --percent 10 --seed 42Data Privacy and Anonymization
Section titled “Data Privacy and Anonymization”PII (Personally Identifiable Information)
Section titled “PII (Personally Identifiable Information)”Data that can identify a person directly or indirectly: names, email addresses, phone numbers, IP addresses, government IDs, etc. Many regulations require protecting PII, including GDPR (EU) and CCPA (California).
Redaction
Section titled “Redaction”Removing or masking sensitive values entirely, typically replacing them with NULL, asterisks, or a placeholder like [REDACTED]. Redaction prioritizes safety over data usefulness.
Anonymization
Section titled “Anonymization”Transforming data to reduce identification risk while keeping it useful for testing or development. Techniques include replacing real names with fake names, hashing emails, or shuffling values.
Hashing
Section titled “Hashing”Converting a value into a fixed-length digest (e.g., SHA-256) that cannot be reversed. Hashing preserves equality (same input → same hash), which is useful for maintaining foreign key relationships during anonymization.
sql-splitter redact dump.sql --hash "*.email" -o safe.sqlMulti-Tenant Concepts
Section titled “Multi-Tenant Concepts”Tenant
Section titled “Tenant”A customer, account, or organization whose data is logically separated from others within the same database. Multi-tenant systems store multiple tenants’ data in shared infrastructure.
Multi-Tenant
Section titled “Multi-Tenant”An architecture where a single database serves multiple tenants, typically with row-level isolation via a tenant identifier column. This is common in SaaS applications.
Sharding
Section titled “Sharding”In sql-splitter context, extracting a subset of data belonging to a specific tenant from a multi-tenant database dump. See shard.
Tenant Column
Section titled “Tenant Column”A column (typically tenant_id, org_id, company_id, account_id) present in tables that indicates which tenant owns each row. sql-splitter auto-detects common tenant column names.
Visualization and Documentation
Section titled “Visualization and Documentation”ERD (Entity Relationship Diagram)
Section titled “ERD (Entity Relationship Diagram)”A diagram showing tables, their columns, and the relationships (foreign keys) between them. ERDs help visualize database structure and identify dependency cycles.
See graph for generating ERDs from SQL dumps.
Schema Drift
Section titled “Schema Drift”When database structure differs across environments (development, staging, production) or differs from expected migrations. Drift can cause import failures or data inconsistencies.
See diff for detecting schema differences between dumps.
File Patterns
Section titled “File Patterns”Glob Pattern
Section titled “Glob Pattern”A simple pattern syntax for matching file paths:
| Pattern | Matches |
|---|---|
*.sql | All SQL files in current directory |
**/*.sql | All SQL files recursively |
dump_*.sql.gz | Files starting with dump_ ending in .sql.gz |
backups/202[3-4]*.sql | Backups from 2023-2024 |
Most sql-splitter commands accept glob patterns:
sql-splitter validate "backups/**/*.sql" --fail-fast