There has been a lot of talk and interest about Disaster Recovery and Cloud
Computing. Specifically, is it possible to use a public cloud as a Disaster
Recovery site? The answer is not that simple. First, an on-premise production
system will be behind at least one firewall and often more! Secondly, the
traffic to/from the public cloud is exposed to all of the networks in between (and possibly some untrustworthy people!). Encryption
is an absolute must for any public cloud based DR! DB2’s HADR is an industry leading High Availability and
Disaster Recovery technology but how well will it work under these conditions
and restrictions? Let's find out...
HADR requires two direct TCP connections: one from the primary to the standby and the other from the standby to the primary. The two connections are required because HADR is a symmetrical configuration in that the primary and standby can switch roles. However, can you imagine a connection
from the raw Internet directly through the corporate firewall into your production
system? Yeah right! A production environment may be able to
connect out to the Internet but not vice versa. We need a secure tunnel through the Internet
in order to use HADR... fortunately ssh provides what we need for this experiment.
SSH provides two options for tunneling: -L and -R. The -L option sets up a
local port that is tunneled to a remote system/port. The -R option sets up a remote
port that is tunneled back to a local system/port. Together, these can facilitate a direct
TCP/IP connection to and from our DR site and production server.
There were two systems involved in the DR to cloud experiment:
On-premise "production" system: production1.ibm.com (system name not important)
Remote cloud DR system: ec2-174-129-117-64.compute-1.amazonaws.com
Since the only connection we can make is from our on-prem production
system to the cloud based DR system, we need to configure the ssh tunnels on the
production system. I used the following two following commands:
db2inst1@production1:~> ssh -i db2cloudkey.pem root@ec2-72-44-54-192.compute-1.amazonaws.com -R 60002:127.0.0.1:60001 –N -f
db2inst1@production1:~> ssh -i db2cloudkey.pem root@ec2-72-44-54-192.compute-1.amazonaws.com
-L 60002:ec2-72-44-54-192.compute-1.amazonaws.com:60001 –N -f
The first command sets up a remote port "60002" on
the cloud DR system that is forwarded to a local port 60001 on the production
system (from cloud back to production). The second command sets up a local port
"60002" that is forwarded to remote port 60001 on the cloud system
(from production to the cloud). These tunnels will allow HADR to see a semi-normal
TCP/IP connection and still be shielded from the fact that there is ssh, tunneling,
encryption, etc involved.
The diagram below gives a bit more information on how this works under the covers. For the ssh -L command, the ssh connection goes from our production system to the cloud (black line). The ssh executable itself listens on port 60002 and forwards traffic to the cloud (blue line). For ssh -R, the ssh connection still goes from the production system to the cloud (black line) but in this case, the ssh daemon (sshd) listens on port 60002 and forwards traffic back to the production system (blue line).
In any case, with the tunnels in place, I tested out the HADR simulator (link) to test
the waters. On the production system, I ran the HADR simulator as follows:
db2inst1@production1:~> ./simhadr.Linux -role P -lhost 127.0.0.1 -lport
60001 -rhost 127.0.0.1 -rport 60002 -n 100
-syncmode ASYNC
From an HADR perspective, it is connecting to the same
system (127.0.0.1) but in fact, these connections are the ssh tunnel that we
set up to and from the cloud DR system. The ASYNC option is one example but I
benchmarked all three: SYNC, NEARSYNC and ASYNC (see table of results below)
On the Cloud DR
system I ran the HADR simulator as follows:
db2inst1@domU-12-31-39-07-B0-61:~> ./simhadr.Linux -role S -rhost 127.0.0.1
-rport 60002 -lhost 127.0.0.1 -lport 60001
Once again, the two IP addresses are the same (127.0.0.1) using our secure tunnel.
With this ssh tunneling method, there is a bit of a quirk with the HADR simulator. If I start the standby simulator first, it
connects to one of the tunneled ports, receives the following error message
and then exits:
Zero byte received. Remote end closed
connection.
This is because the ssh daemon is listening on the tunneled port but has no
connection on the other side of the socket (an artifact of how the tunneling
works). This is why I said "semi-normal TCP/IP connection" before. So instead of getting a "connection refused" (what the HADR simulator expects), the HADR
simulator get a connection successful and a zero byte send. It would be
interesting to see how HADR itself reacts to this error (hopefully it is more
forgiving then the simulator!). In any case, we can still test the simulator by starting the primary first.
In any case, it works... The simulation started right away and was completely unaware of the secure tunnel. Here is the table of average results from a series of 10 benchmarks:
| SYNC: | 1.83 MB/s |
| NEARSYNC: | 1.84 MB/s |
| ASYNC: | 4.84 MB/s |
So... ASYNC is about 2.5x faster than SYNC. This makes sense... the latency to the cloud will hurt a synchronous workload. Also, the difference between SYNC and NEARSYNC are moot also due to the latency to the cloud. Changes to the network options in the
HADR simulator had little affect n throughput (~4%) for this
configuration although they could be more important for other
configurations and network speeds.
It is also worth noting that the connection speed to the DR site will depend heavily on the networks in between. Our of curiosity, I also tested point-point speeds from difference cities around North America and saw everything from very poor performance (80KB/s) to very good performance (34MB/s) when measured from city to city using the Internet.
Summary
The ssh tunnels and the HADR simulator worked well throughout all of the tests and matched the raw speed of the network connection. Although this isn't a thoroughly tested configuration, it was quick and easy to configure DR to the public cloud and it worked well as least for the HADR simulator. DR to a public cloud certainly
seems to be a plausible option, at least from a technical point of view.
On the down side, this configuration is far from being perfect. If ssh tunnels were to go down, obviously the HADR setup would go down with it. In addition, nothing special was done to protect the data once it was on the EC2 instance. We could have encrypted the data in DB2 or we could have tested this with Amazon's newly announced Virtual Private Cloud (which had better network isolation than the host firewall). Lastly, this experiment was done with the HADR simulator and not the real HADR.
In part #2, we'll test HADR itself...
gkjnptrefw
kmg3ucexvj