split
Split a large SQL dump into separate files, one per table.
Alias: sp (e.g., sql-splitter sp dump.sql -o tables/)
When to Use This
Section titled “When to Use This”- Extract specific tables — Pull out just the tables you need without loading the entire dump
- Enable parallel processing — Split once, then process tables concurrently in separate jobs
- Debug specific tables — Isolate a problematic table’s DDL and data for investigation
- Create partial backups — Extract only customer-facing tables while ignoring analytics/logging tables
- Separate schema from data — Use
--schema-onlyfor migrations,--data-onlyfor data refreshes
How It Works
Section titled “How It Works”The split command uses streaming to process dumps in a single pass with constant memory (~10MB). Here’s what happens:
- Read — Statements are read using buffered I/O (64KB buffer for files under 1GB, 256KB for larger files)
- Parse — Each statement is identified and its target table extracted
- Route — Statements are written to the appropriate
<table>.sqlfile via 256KB write buffers - Flush — Buffers are flushed every 100 statements to balance memory and I/O
What Goes Where
Section titled “What Goes Where”| Statement Type | Destination |
|---|---|
CREATE TABLE users | users.sql |
INSERT INTO users | users.sql |
CREATE INDEX ON users | users.sql |
ALTER TABLE users | users.sql |
COPY users FROM stdin (PostgreSQL) | users.sql |
SET, BEGIN, COMMIT, header comments | _global.sql |
| Unrecognized statements | _global.sql |
Table names are sanitized for the filesystem: special characters become underscores, and names are lowercased.
sql-splitter split <INPUT> [OPTIONS]Examples
Section titled “Examples”Basic Splitting
Section titled “Basic Splitting”# Split all tables to output/sql-splitter split database.sql --output=tables/
# Preview what would be created (no files written)sql-splitter split database.sql --dry-run --verboseWorking with Specific Dialects
Section titled “Working with Specific Dialects”# PostgreSQL pg_dump outputsql-splitter split pg_dump.sql -o tables/ --dialect=postgres
# SQLite .dump outputsql-splitter split sqlite_dump.sql -o tables/ --dialect=sqliteExtracting Specific Tables
Section titled “Extracting Specific Tables”# Only extract users and orderssql-splitter split dump.sql --tables users,orders -o subset/
# Extract schema for all tables, data for just userssql-splitter split dump.sql -o schema/ --schema-onlysql-splitter split dump.sql -o data/ --data-only --tables usersCompressed Input
Section titled “Compressed Input”# Automatically decompresses based on extensionsql-splitter split backup.sql.gz -o tables/sql-splitter split backup.sql.zst -o tables/sql-splitter split backup.sql.xz -o tables/See Compression for all supported formats.
Batch Processing
Section titled “Batch Processing”# Split multiple dumpssql-splitter split "backups/*.sql" -o output/ --fail-fast
# Process all dumps recursivelysql-splitter split "dumps/**/*.sql.gz" -o tables/Options
Section titled “Options”| Flag | Short | Description | Default |
|---|---|---|---|
--output | -o | Output directory | output |
--dialect | -d | SQL dialect: mysql, postgres, sqlite, mssql | auto-detect |
--tables | -t | Only split these tables (comma-separated) | all |
--schema-only | Only DDL (CREATE TABLE, indexes) | false | |
--data-only | Only DML (INSERT, COPY) | false | |
--verbose | -v | Show verbose output | false |
--progress | -p | Show progress bar | false |
--dry-run | Preview without writing files | false | |
--fail-fast | Stop on first error (for glob patterns) | false | |
--json | Output results as JSON | false |
Output Structure
Section titled “Output Structure”tables/├── users.sql # CREATE TABLE + INSERT for users├── orders.sql # CREATE TABLE + INSERT for orders├── products.sql # CREATE TABLE + INSERT for products└── _global.sql # Header statements, SET commandsThe _global.sql file contains:
- Session settings (
SET NAMES utf8mb4,SET FOREIGN_KEY_CHECKS=0) - Transaction markers (
BEGIN,COMMIT) - Header comments from the dump tool
- Any statements that couldn’t be associated with a specific table
JSON Output
Section titled “JSON Output”sql-splitter split dump.sql -o tables/ --json{ "input": "dump.sql", "output_dir": "tables/", "tables": [ { "name": "users", "file": "tables/users.sql", "rows": 150, "bytes": 12345 }, { "name": "orders", "file": "tables/orders.sql", "rows": 500, "bytes": 45678 } ], "statistics": { "total_tables": 2, "total_rows": 650, "total_bytes": 58023, "elapsed_ms": 42, "throughput_mb_per_sec": 612.5 }}Exit Codes
Section titled “Exit Codes”| Code | Meaning |
|---|---|
0 | Success — all files written |
1 | Error — I/O failure, parse error, or invalid input |
2 | Invalid arguments — bad CLI flags |
See Exit Codes for scripting examples.
Composing with Other Tools
Section titled “Composing with Other Tools”Split → Compress Individual Tables
Section titled “Split → Compress Individual Tables”# Split, then compress each table filesql-splitter split dump.sql -o tables/for f in tables/*.sql; do gzip "$f"; doneSplit → Query Specific Tables
Section titled “Split → Query Specific Tables”# Extract users table, then query itsql-splitter split dump.sql --tables users -o tmp/sql-splitter query tmp/users.sql "SELECT * FROM users WHERE created_at > '2024-01-01'"Split → Parallel Import
Section titled “Split → Parallel Import”# Split once, import tables in parallelsql-splitter split dump.sql -o tables/ls tables/*.sql | xargs -P4 -I{} mysql mydb < {}Analyze → Split Specific Tables
Section titled “Analyze → Split Specific Tables”# Find largest tables, then extract just thosesql-splitter analyze dump.sql --json | jq -r '.tables | sort_by(-.bytes) | .[0:3] | .[].name'# Output: orders, logs, events
sql-splitter split dump.sql --tables orders,logs,events -o large_tables/Split → Merge Subset
Section titled “Split → Merge Subset”# Extract all tables, merge back a subset in dependency ordersql-splitter split dump.sql -o all/sql-splitter merge all/users.sql all/orders.sql all/order_items.sql -o subset.sqlTroubleshooting
Section titled “Troubleshooting”Table not appearing in output
Section titled “Table not appearing in output”Symptom: You expect a table but it’s not in the output directory.
Causes:
- The dump only contains DDL (schema) or only DML (data), and you used the opposite
--schema-onlyor--data-onlyflag - The table name in the dump differs from what you expect (check case sensitivity)
- The table has no statements at all in the dump
Fix: Run sql-splitter analyze dump.sql to see exactly which tables exist and what statements they have.
Empty output files
Section titled “Empty output files”Symptom: Table files are created but contain no data.
Causes:
- Using
--data-onlyon a dump that only hasCREATE TABLEstatements - Using
--schema-onlyon a data-only dump
Fix: Remove the --schema-only or --data-only flag, or check your dump contains what you expect with analyze.
Dialect detection failed
Section titled “Dialect detection failed”Symptom: Error message about unable to detect dialect, or wrong SQL syntax in output.
Causes:
- The dump lacks dialect-specific markers (backticks for MySQL,
COPYfor PostgreSQL) - Mixed-dialect dump (manually edited)
Fix: Explicitly specify the dialect: --dialect=postgres or --dialect=mysql.
How do I get compressed output?
Section titled “How do I get compressed output?”Symptom: You want .sql.gz files instead of .sql.
sql-splitter writes uncompressed output. Pipe through compression after:
sql-splitter split dump.sql -o tables/gzip tables/*.sqlOr for a single compressed output, use merge with compression:
sql-splitter split dump.sql -o tables/sql-splitter merge "tables/*.sql" -o combined.sql.gzPermission denied writing output
Section titled “Permission denied writing output”Symptom: Permission denied error when writing to output directory.
Fix: Ensure the output directory exists and is writable:
mkdir -p tables/sql-splitter split dump.sql -o tables/See Also
Section titled “See Also”merge— Combine split files back into one dumpanalyze— View statistics without splittingorder— Reorder statements in topological order- Compression — Supported input compression formats
- Dialects — Dialect-specific behavior
- JSON Output Schema — Schema for
--jsonoutput