validate
Validate SQL dump files for syntax errors, encoding issues, and referential integrity.
Alias: val (e.g., sql-splitter val dump.sql --strict)
When to Use This
Section titled “When to Use This”- Pre-restore verification - Check dumps before importing to catch issues early
- CI/CD quality gates - Block deployments if backups are malformed
- Backup integrity monitoring - Verify automated backups haven’t been corrupted
- Encoding issue detection - Find invalid UTF-8 sequences that cause import failures
- Referential integrity auditing - Detect orphaned foreign key references and duplicate primary keys
Use diff instead if you need to compare two dumps for differences.
How It Works
Section titled “How It Works”Validation runs in two passes:
-
Schema pass - Parses the entire dump, validates SQL syntax, checks UTF-8 encoding, and builds a table registry. This pass is fast and streams through the file with constant memory (~10MB).
-
Data integrity pass (optional) - If PK/FK checks are enabled, a second pass tracks primary key values and verifies foreign key references. This uses a
PkHashSetthat stores 64-bit hashes of seen values—memory grows with row count (roughly 8 bytes per unique PK value).
Memory considerations: For a table with 10 million rows, expect ~80MB memory for PK tracking. Use --no-fk-checks for large dumps to skip the data integrity pass entirely, or --max-rows-per-table to limit tracking per table.
Exit codes:
0- Valid (no errors; warnings may exist)1- Invalid (errors found, or warnings with--strict)2- Runtime error (file not found, parse failure)
sql-splitter validate <INPUT> [OPTIONS]Examples
Section titled “Examples”Basic Validation
Section titled “Basic Validation”# Validate a single dumpsql-splitter validate dump.sql
# Validate with progress bar (useful for large files)sql-splitter validate large-dump.sql --progress
# Validate a PostgreSQL pg_dump filesql-splitter validate pg_dump.sql --dialect=postgresCI/CD Integration
Section titled “CI/CD Integration”# Strict mode - treat warnings as errorssql-splitter validate dump.sql --strict
# JSON output for parsing in CIsql-splitter validate dump.sql --json
# Fail fast on first error (for batch validation)sql-splitter validate "backups/*.sql" --fail-fast --strictMemory-Constrained Validation
Section titled “Memory-Constrained Validation”# Skip PK/FK checks entirely (fastest, lowest memory)sql-splitter validate dump.sql --no-fk-checks
# Limit rows checked per table (middle ground)sql-splitter validate dump.sql --max-rows-per-table 100000
# No limit on rows (full check, highest memory)sql-splitter validate dump.sql --no-limitOptions
Section titled “Options”| Flag | Short | Description | Default |
|---|---|---|---|
--dialect | -d | SQL dialect: mysql, postgres, sqlite, mssql | auto-detect |
--strict | Treat warnings as errors (exit 1) | false | |
--json | Output results as JSON | false | |
--max-rows-per-table | Max rows per table for PK/FK checks (0 = unlimited) | 1,000,000 | |
--no-limit | Disable row limit for PK/FK checks | false | |
--no-fk-checks | Skip PK/FK data integrity checks | false | |
--progress | -p | Show progress bar | false |
--fail-fast | Stop on first error (for glob patterns) | false |
Validation Checks
Section titled “Validation Checks”| Check | What It Detects | Memory Impact |
|---|---|---|
| SQL syntax | Malformed statements, unclosed strings, invalid keywords | None (streaming) |
| Encoding | Invalid UTF-8 byte sequences, mixed encodings | None (streaming) |
| DDL/DML consistency | INSERT/COPY into non-existent tables, column count mismatches | Minimal (table registry) |
| Duplicate PK | Multiple rows with the same primary key value | ~8 bytes per row |
| FK referential integrity | Foreign key values referencing non-existent parent rows | ~8 bytes per FK reference |
The first three checks run during the schema pass. PK/FK checks require the data integrity pass and can be disabled with --no-fk-checks.
Output
Section titled “Output”✓ dump.sql is valid
Warnings: - Table 'logs' has no primary key - Column 'users.email' allows NULL but has unique constraintJSON Output
Section titled “JSON Output”sql-splitter validate dump.sql --json{ "dialect": "mysql", "issues": [ { "severity": "warning", "code": "W002", "message": "Table 'logs' has no primary key", "location": { "table": "logs", "statement_index": 3 } } ], "summary": { "errors": 0, "warnings": 1, "info": 0, "tables_scanned": 5, "statements_scanned": 150 }, "checks": { "syntax": "ok", "encoding": "ok", "ddl_dml_consistency": "ok", "pk_duplicates": "ok", "fk_integrity": "ok" }}CI Usage
Section titled “CI Usage”# Simple CI gatesql-splitter validate dump.sql --strict || exit 1
# Parse JSON with jq (check for errors)sql-splitter validate dump.sql --json | jq 'select(.summary.errors > 0)'
# Multi-file validation returns aggregate with results arraysql-splitter validate "*.sql" --json | jq '.results[] | select(.summary.errors > 0)'Composing with Other Tools
Section titled “Composing with Other Tools”Validate Before Splitting
Section titled “Validate Before Splitting”Catch issues before investing time in splitting large dumps:
sql-splitter validate dump.sql --strict && \ sql-splitter split dump.sql -o tables/Validate After Converting Dialects
Section titled “Validate After Converting Dialects”Ensure dialect conversion produces valid output:
sql-splitter convert mysql.sql -o postgres.sql --to postgressql-splitter validate postgres.sql --dialect=postgres --strictValidate in a Pipeline with analyze
Section titled “Validate in a Pipeline with analyze”Check integrity and gather statistics in one pass:
for dump in backups/*.sql; do echo "=== $dump ===" sql-splitter validate "$dump" --no-fk-checks && \ sql-splitter analyze "$dump"donePre-import Workflow
Section titled “Pre-import Workflow”Full validation pipeline before database restore:
sql-splitter validate dump.sql --strict --json > validation.jsonif [ $? -eq 0 ]; then sql-splitter order dump.sql -o ordered.sql mysql mydb < ordered.sqlfiTroubleshooting
Section titled “Troubleshooting”Out of memory during validation
Section titled “Out of memory during validation”Symptom: Process killed or “memory allocation failed” error on large dumps.
Cause: PK/FK integrity checks track hashes of all primary key values, which grows with row count.
Solutions:
- Skip integrity checks entirely:
--no-fk-checks - Limit rows checked per table:
--max-rows-per-table 500000 - Validate tables individually after splitting:
sql-splitter splitthen validate each file
False positive FK errors
Section titled “False positive FK errors”Symptom: Validation reports missing FK references, but data imports successfully.
Cause: The dump may define foreign keys before the referenced parent rows appear (common with deferred constraints or out-of-order dumps).
Solutions:
- Reorder the dump first:
sql-splitter order dump.sql -o ordered.sql - Skip FK checks:
--no-fk-checks - If using PostgreSQL with deferred FKs, this is expected behavior—the database handles it at commit time
Encoding errors on valid files
Section titled “Encoding errors on valid files”Symptom: UTF-8 encoding errors on files that open fine in editors.
Cause: The file may use Latin-1, Windows-1252, or mixed encodings.
Solutions:
- Convert to UTF-8:
iconv -f WINDOWS-1252 -t UTF-8 dump.sql > dump-utf8.sql - Check the original
mysqldumpcharset: ensure--default-character-set=utf8mb4
Validation passes but import fails
Section titled “Validation passes but import fails”Symptom: No validation errors, but mysql or psql reports syntax errors.
Cause: sql-splitter validates common SQL patterns but may not catch all dialect-specific edge cases.
Solutions:
- Ensure correct dialect:
--dialect=postgresfor pg_dump files - Check for dialect-specific syntax not yet supported
- Report the issue with a minimal reproduction
”Table has no primary key” warnings
Section titled “”Table has no primary key” warnings”Symptom: Many warnings about tables without PKs.
Cause: PK duplicate detection requires knowing the primary key. Tables without PKs can only have basic syntax/encoding checks.
Solutions:
- This is informational—data integrity for these tables is skipped
- Use
--strictonly if you want to enforce PKs on all tables - Consider adding PKs to these tables in your schema
See Also
Section titled “See Also”analyze- View table statistics without validationdiff- Compare two dumps for differencesorder- Reorder statements for FK-safe imports- JSON Output Schema - Schema for
--jsonoutput - Memory & Streaming - Detailed memory profiles per command
- Glossary - Database terminology reference