Sunday, May 19, 2013

Postgresql Replication part2

Further Testing
Quering recovery process
If you have a live database that processes a lot of data it may take some time for the replication server to “catch up”   There are a few ways to see where it is in the process.

From the slave database server log in as root and run the following command

        >    ps -AF | grep post

Here are two more ways to query the database to see what WAL file it is working on .

From the Master database you can run the following command

        >    psql -c "SELECT pg_current_xlog_location()"

Which returns something like this

So how do you read this?

If you look in the pg_xlog folder on the main server you would see files named like this


Now lets deconstruct the E8C/987975C0


That is how the numbers line up and the rest is a HEX offset within the file (Where within the file it is reading)

Now from the Slave database you can run the following command.

        >    psql -c "SELECT pg_last_xlog_replay_location()"

Which returns something like this

Primary stop/start
As some further testing I stopped the primary database and started it back up after a few minutes.   (I also stopped and restarted my python program)

        >    sudo /etc/init.d/postgresql stop
        >    sleep 120
        >    sudo /etc/init.d/postgresql start
        >    ./

The replication database had no problem with this

Replication stop/start

From the replication server I stopped and restarted the postgres server.

        >    sudo /etc/init.d/postgresql stop
        >    sleep 240
        >    sudo /etc/init.d/postgresql start

No problem with this either

Test Delete from Replication

The Replication server only has read access, let’s try and delete.
From the database on the replication server try and run this

        >    \c nand
        >    delete from data;

Change the Replication server to a primary server

Warning do not do this in a live system!!! I just did this to make sure it works…  
The recover.conf file defined /home/postgres/failover as the trigger_file.   Create a file here and see if it stops being a replication server.

Run the following commands

        >    sudo mkdir -p /home/postgres
        >    touch /home/postgres/failover

Now when I log into the database from the replication server and run the following commands.

        >    \c nand
        >    select count(*) from data;

I see that it is no longer getting fed from the primary database server.

And I can run the following command

        >    delete from data;

And it can now write to the database.

Also when this occurs recovery.conf was renamed to recovery.done.

Now that the Replication server has become a Primary server you can’t easily switch it back to a Replication server.  Which is it should be!