Skip to content

Core Concepts

This page covers the core concepts you need to understand when using sql-splitter. For detailed definitions of database terminology used throughout the documentation, see the Glossary.

sql-splitter supports four SQL dialects:

DialectToolFeatures
MySQLmysqldumpDefault dialect, backtick quoting, INSERT INTO
PostgreSQLpg_dumpCOPY FROM stdin, $$ strings, SET commands
SQLite.dumpINTEGER PRIMARY KEY, .schema commands
MSSQLsqlcmd / SSMSGO batch separators, IDENTITY, nvarchar

sql-splitter automatically detects the dialect by analyzing:

  • Keywords: AUTO_INCREMENT (MySQL), SERIAL (PostgreSQL), GO (MSSQL)
  • Quoting: Backticks (MySQL), double quotes (PostgreSQL/SQLite), brackets (MSSQL)
  • Session commands: SET @ (MySQL), SET client_encoding (PostgreSQL)

Override with --dialect:

Terminal window
sql-splitter split dump.sql --dialect=postgres

sql-splitter uses a streaming architecture that:

  1. Reads in chunks: Fixed-size buffer reads, not full file loads
  2. Processes line-by-line: Statements are parsed as they’re read
  3. Writes immediately: Output files are written in real-time

This means:

  • Constant memory: ~50MB regardless of file size
  • No file size limit: Process 10GB+ dumps
  • Fast startup: No waiting to load the entire file

sql-splitter automatically detects and decompresses:

ExtensionFormatNotes
.gzgzipMost common
.bz2bzip2High compression
.xzxz/LZMAVery high compression
.zstzstdFast compression

Pass compressed files directly:

Terminal window
sql-splitter split backup.sql.gz -o tables/
sql-splitter analyze dump.sql.zst
sql-splitter convert mysql.sql.bz2 --to postgres

Most commands accept glob patterns for batch operations:

Terminal window
# All SQL files in current directory
sql-splitter validate "*.sql"
# Recursive search
sql-splitter analyze "backups/**/*.sql"
# Specific patterns
sql-splitter split "dumps/*.sql.gz" -o output/

Use --fail-fast to stop on first error:

Terminal window
sql-splitter validate "*.sql" --fail-fast

All commands support --json for machine-readable output:

Terminal window
sql-splitter analyze dump.sql --json | jq '.tables[].name'
sql-splitter validate dump.sql --json | jq '.errors'

JSON output includes:

  • Input/output file paths
  • Processing statistics
  • Per-table breakdowns
  • Warnings and errors
CodeMeaning
0Success
1Error (syntax, I/O, validation failure)
2Invalid arguments

Use in CI scripts:

Terminal window
sql-splitter validate dump.sql --strict || exit 1

sql-splitter recognizes these SQL statement types:

DDL (Schema):

  • CREATE TABLE
  • CREATE INDEX
  • ALTER TABLE
  • DROP TABLE

DML (Data):

  • INSERT INTO
  • COPY ... FROM stdin (PostgreSQL)

Control:

  • Session settings (SET, USE)
  • Comments
  • Transaction markers