Sunday, May 19, 2013

Postgresql Replication part2



Further Testing
Quering recovery process
If you have a live database that processes a lot of data it may take some time for the replication server to “catch up”   There are a few ways to see where it is in the process.

From the slave database server log in as root and run the following command



        >    ps -AF | grep post



Here are two more ways to query the database to see what WAL file it is working on .

From the Master database you can run the following command


        >    psql -c "SELECT pg_current_xlog_location()"

Which returns something like this




So how do you read this?

If you look in the pg_xlog folder on the main server you would see files named like this

0000000100000E8C00000095
0000000100000E8C00000096
0000000100000E8C00000097
0000000100000E8C00000098
0000000100000E8C00000099
0000000100000E8C0000009A

Now lets deconstruct the E8C/987975C0

E8C/987975C0
0000000100000E8C00000098

That is how the numbers line up and the rest is a HEX offset within the file (Where within the file it is reading)

Now from the Slave database you can run the following command.


        >    psql -c "SELECT pg_last_xlog_replay_location()"



Which returns something like this



Primary stop/start
As some further testing I stopped the primary database and started it back up after a few minutes.   (I also stopped and restarted my python program)


        >    sudo /etc/init.d/postgresql stop
        >    sleep 120
        >    sudo /etc/init.d/postgresql start
        >    ./insertDB.py


The replication database had no problem with this

Replication stop/start

From the replication server I stopped and restarted the postgres server.


        >    sudo /etc/init.d/postgresql stop
        >    sleep 240
        >    sudo /etc/init.d/postgresql start


No problem with this either

Test Delete from Replication

The Replication server only has read access, let’s try and delete.
From the database on the replication server try and run this


        >    \c nand
        >    delete from data;


Change the Replication server to a primary server

Warning do not do this in a live system!!! I just did this to make sure it works…  
The recover.conf file defined /home/postgres/failover as the trigger_file.   Create a file here and see if it stops being a replication server.

Run the following commands


        >    sudo mkdir -p /home/postgres
        >    touch /home/postgres/failover



Now when I log into the database from the replication server and run the following commands.



        >    \c nand
        >    select count(*) from data;

I see that it is no longer getting fed from the primary database server.





And I can run the following command

        >    delete from data;


And it can now write to the database.

Also when this occurs recovery.conf was renamed to recovery.done.

Now that the Replication server has become a Primary server you can’t easily switch it back to a Replication server.  Which is it should be!