Migrating from PostgreSQL to MySQL: A Step-by-Step Guide

How to Convert PostgreSQL Schemas and Data to MySQL

Migrating a database from PostgreSQL to MySQL requires careful planning and execution because the two systems differ in SQL dialects, data types, constraints, indexes, and features (e.g., sequences, arrays, JSON behavior). This guide provides a clear, practical step-by-step process to convert PostgreSQL schemas and data to MySQL for a typical application.

1. Plan and prepare

  1. Inventory: List all databases, schemas, tables, views, stored procedures, triggers, indexes, constraints, and extensions in use.
  2. Assess compatibility: Note PostgreSQL-specific features (ARRAY, hstore, sequences, partial indexes, CHECK constraints, custom types, PL/pgSQL functions, GIST/GIN indexes) and plan replacements or workarounds.
  3. Set downtime strategy: Choose between full downtime, phased migration, or replication/dual-write. For large or high-availability systems, plan a replication or cutover window.
  4. Backup: Take consistent logical and physical backups of PostgreSQL (pg_dump / pg_basebackup) and snapshot configurations.

2. Prepare MySQL target

  1. Install MySQL: Choose MySQL Community Server (or compatible fork like MariaDB if acceptable). Ensure version supports needed features (e.g., JSON, generated columns).
  2. Configure server: Tune character set/collation (utf8mb4), timezone, max_allowed_packet, innodb_buffer_pool_size, and other performance settings.
  3. Create target database and users: Set appropriate privileges and secure credentials.

3. Convert schema

  1. Extract PostgreSQL schema: Use pgdump –schema-only to get SQL for tables, constraints, indexes, and sequences.
    Example:

    Code

    pg_dump -U pguser -s -f schema.sql dbname
  2. Transform DDL differences: Edit schema.sql or use tools to convert types and syntax. Key changes:
    • Data types:
      • SERIAL / BIGSERIAL -> use MySQL AUTO_INCREMENT on integer primary keys.
      • INTEGER, BIGINT: map similarly.
      • TEXT -> TEXT; VARCHAR(n) -> VARCHAR(n).
      • BOOLEAN -> TINYINT(1) or BOOLEAN (MySQL maps it to tinyint).
      • TIMESTAMP WITH TIME ZONE -> TIMESTAMP or DATETIME (MySQL lacks true TZ-aware types; store UTC or use separate TZ handling).
      • BYTEA -> BLOB.
      • JSON/JSONB -> JSON (supported in MySQL 5.7+ but behavior and indexing differ).
      • ARRAY, hstore, composite types -> convert to normalized tables or text/JSON columns.
    • Constraints and indexes:
      • CHECK constraints: MySQL historically ignored CHECK; newer MySQL may parse them but not enforce—implement with triggers or application logic if needed.
      • Partial indexes: emulate via indexed computed columns or separate tables.
      • Unique constraints and primary keys: recreate in MySQL DDL.
    • Sequences:
      • Replace sequences with AUTO_INCREMENT or create equivalent tables and set AUTO_INCREMENT starting value using ALTER TABLE … AUTO_INCREMENT = n.
    • Functions and stored procedures:
      • Rewrite PL/pgSQL functions to MySQL stored procedures/ functions, or implement logic in application layer.
    • Views and materialized views:
      • Convert views; materialized views require creating tables and refreshing them on demand.
  3. Automate conversion (optional): Use tools like:
    • pgloader (can migrate schema and data with transformations)
    • AWS Schema Conversion Tool
    • Custom scripts (Python with SQLAlchemy, or sed/awk for simple replacements) These tools help but manual review is necessary.

4. Migrate data

  1. Choose transfer method: Options include pgdump –data-only with CSV export, pgloader, or direct ETL scripts. For moderate sizes, CSV import is reliable.
  2. Export data to CSV: For each table:

    Code

    COPY table_name TO ‘/tmp/tablename.csv’ CSV HEADER;

    Or run queries to normalize complex types (e.g., arrays to one-to-many rows).

  3. Preprocess CSVs: Convert values that differ between DBMS:
    • Booleans: PostgreSQL “t”/“f” or “true”/“false” -> ⁄0.
    • NULL/empty strings: ensure MySQL will interpret correctly.
    • Timestamps: convert timezone formats to UTC or MySQL-compatible format.
    • JSON: ensure valid JSON (MySQL’s JSON type requires strict formatting).
  4. Load into MySQL: Use LOAD DATA INFILE or mysqlimport for fast bulk loads:

    Code

    LOAD DATA LOCAL INFILE ‘/tmp/table_name.csv’ INTO TABLE table_name FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘“‘LINES TERMINATED BY ’ ‘ IGNORE 1 LINES;

    Adjust settings for encoding, NULL representation, and field separators.

  5. Preserve IDs and sequences: If using AUTOINCREMENT, set appropriate starting value:

    Code

    ALTER TABLE table_name AUTO_INCREMENT = ;
  6. Indexes and constraints: For faster loads, create tables without non-essential indexes and foreign keys, load data, then add indexes and constraints afterward.

5. Migrate additional objects and logic

  1. Triggers: Rewrite PostgreSQL triggers as MySQL triggers or move logic into application code. Test ordering and timing differences.
  2. Stored procedures/functions: Translate PL/pgSQL to MySQL’s procedural language or replace with application-level logic.
  3. Views: Recreate views in MySQL; for complex queries, validate execution plans and performance.
  4. Permissions: Recreate roles and grants as MySQL users with matching privileges.

6. Test thoroughly

  1. Data validation: Row counts, checksums, and spot-check important fields.
    • Row counts per table: compare source vs target.
    • Checksums: e.g., use md5(concat(…)) over key columns to compare content.
  2. Application tests: Run integration tests against MySQL target to catch SQL dialect issues, query performance regressions, and logic errors.
  3. Performance tuning: Analyze slow queries, add indexes, rewrite queries as needed. Use EXPLAIN in MySQL to inspect plans.
  4. Concurrency and transactions: Validate isolation levels and transaction semantics; MySQL default InnoDB behavior differs subtly from PostgreSQL.

7. Cutover

  1. Choose cutover strategy: For minimal downtime, consider replication or dual-write:
    • Use logical replication tools (e.g., pglogical) or tools like Debezium to stream changes and apply to MySQL.
    • Alternately, freeze writes, do final sync, then switch application to MySQL.
  2. Final sync: Apply any remaining deltas, re-run validations, and ensure sequences/AUTO_INCREMENT values are correct.
  3. Switch application: Update DB connection strings and monitor closely. Keep PostgreSQL available as a rollback option until stable.

8. Post-migration checks and cleanup

  1. Monitor: Watch metrics (latency, errors, slow queries) and adjust MySQL configuration and indexes.
  2. Audit: Confirm all scheduled jobs, backups, and maintenance tasks are in place for MySQL.
  3. Decommission: Once confident, decommission PostgreSQL resources following backup retention policies.

Example quick checklist

  1. Inventory DB objects and extensions.
  2. Create MySQL instance and tune settings.
  3. Convert DDL (types, sequences, constraints).
  4. Export data (CSV/pgloader) and transform values.
  5. Bulk load into MySQL, recreate indexes.
  6. Migrate triggers, functions, views.
  7. Test data integrity and application behavior.
  8. Perform cutover and monitor.

Follow these steps, adjust for your application’s specific features (JSON-heavy schemas, geospatial types, or advanced indexing), and perform incremental tests to ensure a smooth migration.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *