Configuration
By default Bloviate inspects the schema and picks a generator for every column based on its JDBC type. This guide shows how to take progressively more control — from overriding row counts, to overriding individual columns, to non-uniform distributions, reproducible seeds, and the parallel fill path.
Configuration is layered:
DatabaseConfiguration— global defaults: batch size, default row count, database support, and an optional set of per-table overrides.TableConfiguration— overrides the row count for one table, and optionally carries per-column overrides.ColumnConfiguration— overrides how a single column is generated, via aColumnGeneratorFactory(aRandom -> DataGenerator<?>lambda). The engine hands the factory a column-seededRandomso output stays reproducible.
Per-table row counts
Section titled “Per-table row counts”Generate different numbers of rows for specific tables while every other table uses the default:
import io.bloviate.db.*;import io.bloviate.ext.PostgresSupport;import java.util.Set;
// "users" gets 50 rows; all other tables fall back to the default (100)Set<TableConfiguration> tableConfigs = Set.of( new TableConfiguration("users", 50));
DatabaseConfiguration config = new DatabaseConfiguration( 10, // batch size 100, // default rows per table new PostgresSupport(), // database support tableConfigs // table-specific overrides);
new DatabaseFiller.Builder(connection, config) .build() .fill();Per-column generation overrides
Section titled “Per-column generation overrides”Pin a specific column to a custom generator. Here the status_code column on orders is
constrained to integers in [1, 10) instead of the type’s default:
import io.bloviate.db.*;import io.bloviate.ext.PostgresSupport;import io.bloviate.gen.IntegerGenerator;import java.util.Set;
// ColumnGeneratorFactory is a `RandomGenerator -> DataGenerator<?>` lambda.// The engine supplies a column-seeded RandomGenerator for reproducible output.Set<ColumnConfiguration> columnConfigs = Set.of( new ColumnConfiguration("status_code", random -> new IntegerGenerator.Builder(random).start(1).end(10).build()));
// 1,000 rows for "orders", with the column override appliedSet<TableConfiguration> tableConfigs = Set.of( new TableConfiguration("orders", 1000, columnConfigs));
DatabaseConfiguration config = new DatabaseConfiguration( 128, 100, new PostgresSupport(), tableConfigs);
new DatabaseFiller.Builder(connection, config) .build() .fill();Column names are matched case-insensitively. Any column without an override keeps its default, type-based generator.
Value distributions
Section titled “Value distributions”Real columns are rarely uniform — a status is mostly ACTIVE, a rating clusters around its
mean, a referenced product_id follows a popularity curve, and a created_at bunches toward the
present. The Distributions helper returns ready-made ColumnGeneratorFactory values so a column
can opt into a non-uniform distribution without writing a factory:
import io.bloviate.db.*;import io.bloviate.ext.PostgresSupport;import java.util.Map;import java.util.Set;
Set<ColumnConfiguration> columnConfigs = Set.of( // 70% NEW, 25% SHIPPED, 5% CANCELLED (weights need not sum to 1) new ColumnConfiguration("status", Distributions.weighted(Map.of("NEW", 0.7, "SHIPPED", 0.25, "CANCELLED", 0.05))), // normal(mean=4, sd=1) rounded and clamped to [1, 5] new ColumnConfiguration("rating", Distributions.normalInt(4, 1, 1, 5)), // Zipfian (power-law) over [1, 10000] — a few hot ids, a long thin tail new ColumnConfiguration("product_id", Distributions.zipfian(10_000)), // timestamps skewed toward the recent end of the window new ColumnConfiguration("created_at", Distributions.recentTimestamps()));
DatabaseConfiguration config = new DatabaseConfiguration( 128, 100, new PostgresSupport(), Set.of(new TableConfiguration("orders", 100_000, columnConfigs)));Available shapes: weighted(...) (categorical), normal(...) / normalInt(...) (bounded
Gaussian), zipfian(...) (power-law), and recentTimestamps(...) (recency-skewed). Each is built
from the engine’s column seed, so output stays reproducible and composes with foreign-key
reseeding and parallel fills like any other generator. These are specified distributions, not
distributions learned from real data.
Constraint conformance
Section titled “Constraint conformance”On PostgreSQL, Bloviate reads each table’s CHECK constraints and ENUM types and generates
values that satisfy them — automatically, no configuration. So given:
CREATE TYPE order_status AS ENUM ('NEW', 'PAID', 'SHIPPED', 'CANCELLED');
CREATE TABLE orders ( status order_status NOT NULL, rating integer CHECK (rating BETWEEN 1 AND 5), priority integer CHECK (priority IN (1, 2, 3)), amount numeric(8,2) CHECK (amount >= 0 AND amount <= 9999.99));status only gets one of its enum labels, rating lands in [1, 5], priority is one of
1/2/3, and amount stays in range — instead of random values an insert would reject. The common
forms are honored: IN (...), BETWEEN, and >=/<=/>/< comparisons, for integer, numeric,
floating, and text columns, plus enum/domain allowed values.
Notes:
- A
CHECKform that can’t be safely satisfied (negation,OR,LIKEpatterns, a one-sided bound) is skipped with a warning, and the column falls back to its type default. - A per-column override or a registry rule always wins, so you can still take full control of a constrained column.
- Open the connection with
stringtype=unspecified(already required for PostgreSQL’s extension types) so enum/INvalues bind. Constraint reading is PostgreSQL-only today.
Reproducible data with seeds
Section titled “Reproducible data with seeds”DatabaseConfiguration takes a base seed. The same schema filled with the same seed always
produces identical data, so test fixtures are deterministic; change the seed for a different — but
still reproducible — dataset. Per-column seeds are derived from stable column identity, and foreign
keys are seeded from their referenced primary key, so referential fidelity holds for any seed.
import io.bloviate.db.*;import io.bloviate.ext.PostgresSupport;
// batch size, rows/table, support, table configs, seedDatabaseConfiguration config = new DatabaseConfiguration( 128, 100, new PostgresSupport(), null, 42L);
new DatabaseFiller.Builder(connection, config).build().fill();The seed defaults to 0 when you use the four-argument constructor, so existing code keeps a
single, stable dataset without changes.
Parallel table fill
Section titled “Parallel table fill”For large, wide schemas the fill can run in parallel. Construct the filler from a pooled
DataSource instead of a single Connection and ask for more than one worker thread:
import io.bloviate.db.*;import io.bloviate.ext.PostgresSupport;import javax.sql.DataSource;
DataSource dataSource = /* a pooled DataSource, e.g. HikariCP */;
DatabaseConfiguration config = new DatabaseConfiguration( 1000, 100_000, new PostgresSupport(), null, 42L);
new DatabaseFiller.Builder(dataSource, config) .threads(8) // fill independent tables concurrently .build() .fill();Bloviate groups tables into topological levels by their foreign keys and fills the independent tables within each level concurrently, one connection per worker, barriering between levels so a child table is never filled before its parent. Each worker fills its table in a single transaction (commit once per table). The fill stays fully reproducible: a table’s data depends only on its own seed and row order, never on which tables fill alongside it, so the same config and seed produce byte-for-byte the same data as a sequential fill.
How much this helps depends on the schema. A wide schema of independent tables sees a large speedup (~3× with 8 workers on a 10-table, 1M-row fixture); a deep, narrow foreign-key chain (each table depending on the previous) has little to parallelize. See the benchmarks for numbers.
The single-Connection constructor is unchanged and remains the default sequential path — threads
only applies to the DataSource form.
Intra-table partitioning
Section titled “Intra-table partitioning”When a single large table dominates the fill, between-table parallelism can’t help it — it sits
alone in its topological level. Set partitions on that table’s TableConfiguration to split its
rows into that many contiguous ranges filled concurrently, one connection per range, on the
parallel (DataSource + threads) path:
// split the one big table into 8 row ranges; ignored on the single-Connection pathSet<TableConfiguration> tables = Set.of( new TableConfiguration("events", 50_000_000L, 8 /* partitions */));
DatabaseConfiguration config = new DatabaseConfiguration( 1000, 0, new PostgresSupport(), tables, 42L);
new DatabaseFiller.Builder(dataSource, config).threads(8).build().fill();Partitioning is reproducible for a given configuration, including the partition count: key columns and the columns correlated with them (foreign keys, sequences, permutations) are generated positionally, so they are byte-for-byte identical to a sequential fill and foreign-key validity always holds. Only plain non-key random columns take different (but still deterministic) values when you change the partition count — they carry no cross-row contract, so this is by design and keeps the default path free of any per-cell cost.
Size the connection pool for the total concurrent demand (threads, where a partitioned table
counts as partitions units). One case is unsupported: partitioning a parent table whose
primary key is a plain random generator referenced by a foreign key can orphan those references —
partition the child table instead, or use the positional key generators (as the bundled
TPC-C/TPC-H configurations do). A custom generator with internal positional state must implement
io.bloviate.gen.IndexedDataGenerator to stay aligned under partitioning.
Commit strategy
Section titled “Commit strategy”By default the engine leaves the connection’s autocommit untouched (a typical autocommit connection
commits per executeBatch()). Disabling autocommit and committing less often cuts overhead. Pass a
CommitStrategy to DatabaseConfiguration for the sequential path:
import io.bloviate.db.*;
// commit once per table (autocommit off for the fill, restored afterward)DatabaseConfiguration perTable = new DatabaseConfiguration( 1000, 100_000, new PostgresSupport(), null, 42L, CommitStrategy.perTable());
// or bound the open transaction: commit every 50 JDBC batchesDatabaseConfiguration everyN = new DatabaseConfiguration( 1000, 100_000, new PostgresSupport(), null, 42L, CommitStrategy.everyNBatches(50));The default, CommitStrategy.connectionDefault(), preserves today’s behavior (the engine never
touches autocommit). The parallel path already commits once per table; a configured strategy
applies there too.
Tip — driver batch rewrite. Bloviate inserts in JDBC batches, but most drivers only collapse a batch into a single multi-row
INSERTwhen you opt in via the JDBC URL: PostgreSQLreWriteBatchedInserts=true, MySQLrewriteBatchedStatements=true. Enabling it is often the single biggest fill speedup, sequential or parallel. Bloviate logs a warning at fill time when the parameter is missing, andio.bloviate.util.JdbcUrls.withBatchRewrite(url, support.batchRewriteUrlParameter())builds a correctly-parameterized URL if you construct theDataSourceyourself. CockroachDB ignores the parameter, so no warning is emitted there.
Configuration options reference
Section titled “Configuration options reference”Database configuration options
Section titled “Database configuration options”- Batch Size: Number of records inserted in each batch operation
- Record Count: Default number of records to generate per table
- Database Support: Database-specific implementation for optimal compatibility
- Table Configurations: Override the row count for specific tables, and the intra-table
partitionscount for splitting a large table across workers on the parallel path - Column Configurations: Override the generator for specific columns (case-insensitive, reproducible)
- Seed: Base seed for reproducible generation; the same schema and seed always produce the same
data (defaults to
0) - Commit Strategy: How the engine commits — leave autocommit alone (default), commit once per table, or commit every N batches
Parallelism (worker threads for concurrent table fill) is configured on the
DatabaseFiller.Builder via threads(n) with the DataSource constructor.
File generation options
Section titled “File generation options”- Output Format: CSV, TSV, or pipe-delimited
- Row Count: Number of rows to generate
- Custom Column Definitions: Full control over data generation