Further Testing
Quering recovery process
If you have a live database that processes a lot of data it may
take some time for the replication server to “catch up”   There are a few ways to see where it is in the process.
From the slave database server log in as root and run the
following command
| 
 
        >    ps -AF | grep post | 
Here are two more ways to query the database to see what WAL
file it is working on .
From the Master database you can run the following command
| 
 
        >    psql -c "SELECT
  pg_current_xlog_location()" | 
Which returns something like this
So how do you read this?
If you look in the pg_xlog folder on the main server you would
see files named like this
0000000100000E8C00000095
0000000100000E8C00000096
0000000100000E8C00000097
0000000100000E8C00000098
0000000100000E8C00000099
0000000100000E8C0000009A
Now lets deconstruct the E8C/987975C0
E8C/987975C0
0000000100000E8C00000098
That is how the numbers line up and the rest is a HEX offset
within the file (Where within the file it is reading)
Now from the Slave database you can run the following command. 
| 
 
        >    psql -c "SELECT
  pg_last_xlog_replay_location()" | 
Which returns something like this
Primary stop/start
As some further testing I stopped the primary database and
started it back up after a few minutes.   (I also stopped and
restarted my python program)
| 
        > 
    sudo /etc/init.d/postgresql stop 
        > 
    sleep 120 
        > 
    sudo /etc/init.d/postgresql start 
        > 
    ./insertDB.py | 
The replication database had no problem with this
Replication stop/start
From the replication server I stopped and restarted the postgres
server.
| 
        > 
    sudo /etc/init.d/postgresql stop 
        > 
    sleep 240 
        > 
    sudo /etc/init.d/postgresql start | 
No problem with this either
Test Delete from Replication
The Replication server only has read access, let’s try and
delete.
From the database on the replication server try and run this
| 
        > 
    \c nand 
        > 
    delete from data; | 
Change the Replication server
to a primary server
Warning do not do this in a live system!!! I just did this to
make sure it works…  
The recover.conf file defined /home/postgres/failover as the
trigger_file.   Create a file here and see if it stops being a
replication server.
Run the following commands
| 
        > 
    sudo mkdir -p /home/postgres 
        > 
    touch /home/postgres/failover | 
Now when I log into the database from the replication server and
run the following commands.
| 
        > 
    \c nand 
        > 
    select count(*) from data; | 
I see that it is no longer getting fed from the primary database
server.
And I can run the following command
| 
        > 
    delete from data; | 
And it can now write to the database.
Also when this occurs recovery.conf was renamed to
recovery.done.
Now that the Replication server has become a Primary server you
can’t easily switch it back to a Replication server.  Which is it should
be!
 
No comments:
Post a Comment