split

Split a large SQL dump into separate files, one per table.

Alias: sp (e.g., sql-splitter sp dump.sql -o tables/)

When to Use This

Extract specific tables — Pull out just the tables you need without loading the entire dump
Enable parallel processing — Split once, then process tables concurrently in separate jobs
Debug specific tables — Isolate a problematic table’s DDL and data for investigation
Create partial backups — Extract only customer-facing tables while ignoring analytics/logging tables
Separate schema from data — Use --schema-only for migrations, --data-only for data refreshes

How It Works

The split command uses streaming to process dumps in a single pass with constant memory (~10MB). Here’s what happens:

Read — Statements are read using buffered I/O (64KB buffer for files under 1GB, 256KB for larger files)
Parse — Each statement is identified and its target table extracted
Route — Statements are written to the appropriate <table>.sql file via 256KB write buffers
Flush — Buffers are flushed every 100 statements to balance memory and I/O

What Goes Where

Statement Type	Destination
`CREATE TABLE users`	`users.sql`
`INSERT INTO users`	`users.sql`
`CREATE INDEX ON users`	`users.sql`
`ALTER TABLE users`	`users.sql`
`COPY users FROM stdin` (PostgreSQL)	`users.sql`
`SET`, `BEGIN`, `COMMIT`, header comments	`_global.sql`
Unrecognized statements	`_global.sql`

Table names are sanitized for the filesystem: special characters become underscores, and names are lowercased.

Usage

sql-splitter split <INPUT> [OPTIONS]

Examples

Basic Splitting

# Split all tables to output/
sql-splitter split database.sql --output=tables/

# Preview what would be created (no files written)
sql-splitter split database.sql --dry-run --verbose

Working with Specific Dialects

# PostgreSQL pg_dump output
sql-splitter split pg_dump.sql -o tables/ --dialect=postgres

# SQLite .dump output
sql-splitter split sqlite_dump.sql -o tables/ --dialect=sqlite

Extracting Specific Tables

# Only extract users and orders
sql-splitter split dump.sql --tables users,orders -o subset/

# Extract schema for all tables, data for just users
sql-splitter split dump.sql -o schema/ --schema-only
sql-splitter split dump.sql -o data/ --data-only --tables users

Compressed Input

# Automatically decompresses based on extension
sql-splitter split backup.sql.gz -o tables/
sql-splitter split backup.sql.zst -o tables/
sql-splitter split backup.sql.xz -o tables/

See Compression for all supported formats.

Batch Processing

# Split multiple dumps
sql-splitter split "backups/*.sql" -o output/ --fail-fast

# Process all dumps recursively
sql-splitter split "dumps/**/*.sql.gz" -o tables/

Options

Flag	Short	Description	Default
`--output`	`-o`	Output directory	`output`
`--dialect`	`-d`	SQL dialect: `mysql`, `postgres`, `sqlite`, `mssql`	auto-detect
`--tables`	`-t`	Only split these tables (comma-separated)	all
`--schema-only`		Only DDL (CREATE TABLE, indexes)	false
`--data-only`		Only DML (INSERT, COPY)	false
`--verbose`	`-v`	Show verbose output	false
`--progress`	`-p`	Show progress bar	false
`--dry-run`		Preview without writing files	false
`--fail-fast`		Stop on first error (for glob patterns)	false
`--json`		Output results as JSON	false

Output Structure

tables/
├── users.sql        # CREATE TABLE + INSERT for users
├── orders.sql       # CREATE TABLE + INSERT for orders
├── products.sql     # CREATE TABLE + INSERT for products
└── _global.sql      # Header statements, SET commands

The _global.sql file contains:

Session settings (SET NAMES utf8mb4, SET FOREIGN_KEY_CHECKS=0)
Transaction markers (BEGIN, COMMIT)
Header comments from the dump tool
Any statements that couldn’t be associated with a specific table

JSON Output

sql-splitter split dump.sql -o tables/ --json

{
  "input": "dump.sql",
  "output_dir": "tables/",
  "tables": [
    {
      "name": "users",
      "file": "tables/users.sql",
      "rows": 150,
      "bytes": 12345
    },
    {
      "name": "orders",
      "file": "tables/orders.sql",
      "rows": 500,
      "bytes": 45678
    }
  ],
  "statistics": {
    "total_tables": 2,
    "total_rows": 650,
    "total_bytes": 58023,
    "elapsed_ms": 42,
    "throughput_mb_per_sec": 612.5
  }
}

Exit Codes

Code	Meaning
`0`	Success — all files written
`1`	Error — I/O failure, parse error, or invalid input
`2`	Invalid arguments — bad CLI flags

See Exit Codes for scripting examples.

Composing with Other Tools

Split → Compress Individual Tables

# Split, then compress each table file
sql-splitter split dump.sql -o tables/
for f in tables/*.sql; do gzip "$f"; done

Split → Query Specific Tables

# Extract users table, then query it
sql-splitter split dump.sql --tables users -o tmp/
sql-splitter query tmp/users.sql "SELECT * FROM users WHERE created_at > '2024-01-01'"

Split → Parallel Import

# Split once, import tables in parallel
sql-splitter split dump.sql -o tables/
ls tables/*.sql | xargs -P4 -I{} mysql mydb < {}

Analyze → Split Specific Tables

# Find largest tables, then extract just those
sql-splitter analyze dump.sql --json | jq -r '.tables | sort_by(-.bytes) | .[0:3] | .[].name'
# Output: orders, logs, events

sql-splitter split dump.sql --tables orders,logs,events -o large_tables/

Split → Merge Subset

# Extract all tables, merge back a subset in dependency order
sql-splitter split dump.sql -o all/
sql-splitter merge all/users.sql all/orders.sql all/order_items.sql -o subset.sql

Troubleshooting

Table not appearing in output

Symptom: You expect a table but it’s not in the output directory.

Causes:

The dump only contains DDL (schema) or only DML (data), and you used the opposite --schema-only or --data-only flag
The table name in the dump differs from what you expect (check case sensitivity)
The table has no statements at all in the dump

Fix: Run sql-splitter analyze dump.sql to see exactly which tables exist and what statements they have.

Empty output files

Symptom: Table files are created but contain no data.

Causes:

Using --data-only on a dump that only has CREATE TABLE statements
Using --schema-only on a data-only dump

Fix: Remove the --schema-only or --data-only flag, or check your dump contains what you expect with analyze.

Dialect detection failed

Symptom: Error message about unable to detect dialect, or wrong SQL syntax in output.

Causes:

The dump lacks dialect-specific markers (backticks for MySQL, COPY for PostgreSQL)
Mixed-dialect dump (manually edited)

Fix: Explicitly specify the dialect: --dialect=postgres or --dialect=mysql.

How do I get compressed output?

Symptom: You want .sql.gz files instead of .sql.

sql-splitter writes uncompressed output. Pipe through compression after:

sql-splitter split dump.sql -o tables/
gzip tables/*.sql

Or for a single compressed output, use merge with compression:

sql-splitter split dump.sql -o tables/
sql-splitter merge "tables/*.sql" -o combined.sql.gz

Permission denied writing output

Symptom: Permission denied error when writing to output directory.

Fix: Ensure the output directory exists and is writable:

mkdir -p tables/
sql-splitter split dump.sql -o tables/

split

When to Use This

How It Works

What Goes Where

Usage

Examples

Basic Splitting

Working with Specific Dialects

Extracting Specific Tables

Compressed Input

Batch Processing

Options

Output Structure

JSON Output

Exit Codes

Composing with Other Tools

Split → Compress Individual Tables

Split → Query Specific Tables

Split → Parallel Import

Analyze → Split Specific Tables

Split → Merge Subset

Troubleshooting

Table not appearing in output

Empty output files

Dialect detection failed

How do I get compressed output?

Permission denied writing output

See Also