Redundant Cloud Hosting Guide

The purpose of this guide is to answer one of the questions I receive most often:

How do I host my web applications in the cloud in a way that is redundant but also inexpensive?

Before you begin reading the guide, try to keep the following things in mind:

  • Try to understand what an application is doing before blindly configuring it as the guide states. This helps in two ways: it allows you to begin thinking about ways you can improve your configuration for your specific needs and it will give you tools to fix things when they break later.
  • Stay lean. There may be some portions of the guide which may not apply to your application's needs. Instead of wasting your time on additional daemons that you don't need, skip over any parts of the guide that don't apply to your specific application. On the other hand, if you find that your application needs more functionality than this guide provides, be sure to add in extra functionality carefully. See the previous bullet point to understand what I'm talking about.
  • This is not the only way to configure a redundant cloud environment. This guide covers the configuration that I like best. If you don't like a particular daemon or Linux distribution mentioned in the guide, use what you're most comfortable with or what you prefer.
  • Cloud is what you make of it. Don't be afraid to forge your own path.
  • Give me feedback. If you spot something that's incorrect, or if you find a more efficient way to handle a particular problem, let me know! I'll be glad to consider it for the guide and you'll receive proper attribution.

With that out of the way, let's begin the guide.


The high-level overview

To get an idea of the end result, review the diagram shown below:

Redundant cloud hosting configuration

My redundant cloud hosting configuration includes two load balancers, two web nodes, and two database/caching nodes. They all have interfaces on public and private networks.


There are three main service groups that I need to host my applications:

  • Load balancing layer: Two needs are fulfilled at this layer - the distribution of load as well as redirection of traffic away from problematic web nodes.
  • Web service layer: As you could imagine, this layer is the workhorse of the entire configuration. This is where web content is served and where web content is stored in a clustered filesystem.
  • Database/caching layer: Without this layer, the configuration would grind to a halt. The applications running on the web services layer depend on this layer for rapid storage and retrieval of information.

Platform requirements

In order to follow this guide, you'll need the following:

  • Stable Linux distribution - pick whichever one you prefer, but I'll be using Fedora
  • Six virtual machines - anything less than six will get a bit tricky and it reduces your redundancy
  • Public and private network interfaces on each virtual machine - not required, but it's highly recommended
  • One extra IP address - this will be your virtual IP address for load balancing (you will need more if you're hosting multiple sites with SSL, unless you want to use SNI)
  • Ability to share an IP between multiple virtual machines - this will be a requirement for LVS-TUN (if you can't share IP's, you can try using LVS-NAT, but I wouldn't recommend it)
  • Kernel modules - you'll need a few kernel modules, or the ability to compile and use them with your running kernel
  • Linux kernel 2.6.27 or later - there are some great performance improvements for virtual machines and the fuse module in these kernels (not a strict requirement, but highly recommended)

Step by step

I've broken the guide up into functional pieces to allow you to build your configuration and test it along the way. Click on the title of each step to see detailed instructions, diagrams and explanations:


What's the total cost?

Right now, I'm hosting this configuration with Slicehost with the following setup:

  • load balancers: two 256MB instances (2 x $20/month)
  • web nodes: two 1024MB instances (2 x $70/month)
  • database nodes: two 512MB instances (2 x $38/month)

That adds up to $256 per month for the entire configuration at Slicehost. That price also includes 2.1TB of public bandwidth (since the bandwidth is pooled between instances). The only large consumers of bandwidth are the web nodes since they send out a lot of traffic. The load balancers simply receive requests on the public interface and shuttle them to the web nodes over the private network. The database servers would only talk to the public network for package updates.

If you wanted to host the same configuration with Rackspace's Cloud Servers, you could do it for as little as $153.30 per month, but your bandwidth would be billed at the utility rates. For low traffic sites, this may be the better priced option.

Printed from: http://rackerhacker.com/redundant-cloud-hosting-configuration-guide/ .
© Major Hayden 2012.

18 Comments   »

  • What kinds of failures did you get through with this? I'm wondering whether
    your leaving yourself open to slicehost system bugs. Stephan

  • Major Hayden says:

    Stephan -

    Could you elaborate more about what you're after? I'm not sure how to answer your question.

  • morphium says:

    So, you're lucky and your lb or web or db nodes are both on one real host. This host now goes down...so how useful was your load balancing, now?

    morphium

  • Tim Galyean says:

    morphium - That is a possible scenario, however as the guide states "This is not the only way to configure a redundant cloud environment". You can do lots of things with HA including geographical failover. Host issues aside you could just as easily have some sort of DNS failure. The site would go down but your servers would remain online.

    With that in mind you need to design your infrastructure to fit your requirements. If that means ensuring that your site remains online in the event of multiple host failures then that is something you need to design for which may require more servers put online, or across multiple providers. This guide is a great "how to" for people looking to build in the cloud, but is not the solution for everyone.

  • Major Hayden says:

    morphium -

    That's a good point that most folks forget. You'd also have to be concerned if all of your instances are in the same datacenter or even the same geographical region.

    When it really comes down to it, you're always going to have failure points. You just have to decide how much money and time you want to spend to eliminate each one. ;-)

  • errr says:

    Along with what morphium said..
    If I ask support will they allow me to have this setup in a fashion that would allow for it to be more redundant? Could lb1,web1, and db1 be in dfw and lb2,web2,db2 be in ord? Would this setup still work if that were the case? Thanks for this guide it is very handy.

  • Major Hayden says:

    Errr -

    You would need a provider which allows you to have the same IP addresses in more than one datacenter. That's a bit tricky to configure network-wise and I haven't seen a cloud hosting provider yet that offers it. :-(

  • Turiel says:

    Hey,

    Thanks for the guide, lots of useful information. Something I hadn't thought about doing before was loadbalancing memcached... interesting! Would save hassle of specifying all IPs in app config (and changing that with scaling).

    I have a question though; I normally use HAProxy for web backend LB. I see you use it for MySQL and memcached but then use LVS for your web lb. What's the general reasoning behind this choice?

  • Matti says:

    @Major Hayden: there's a Dutch hosting provider (http://www.nucleus.be/en/) that can do that. Physically located servers in different datacenters, but because of their network set-up you can share IPs amongst them.

    If you're stretching the limits for HA, you should also keep in mind redundancy in:
    - DNS layer
    - Network layer (no single cable or single switch to act as SPOF)
    - Power redundancy (usually not an issue in datacenters)
    - Knowledge redundancy (if only one person knows how your HA set-up works, and he drops dead, you're screwed.)
    - IP layer (consider using IPs in different subnets to avoid BGP/ISP failures when one of their /24 announces is failing)
    - Storage layer (since storage will become the heart and soul of your operation)

  • Major Hayden says:

    Matti -

    That's good to know! I'll have to check out their offerings soon.

  • Major Hayden, here a bit more:

    Your redundant system may fail if slicehost has a problem that affects all
    of your virtual machines. Such problems may arise in the slicehost networkor the
    slicehost software. Presumably these are common elements of your redundant system

    Extreme example: the slicehost billing system has a bug and your account is
    marked as overdue. All your VM's are deactivated.

    My first question: did your system get your application smoothly through any problems, and if yes, which problems?

    Second question, what do you think about the slicehost exposure?

    Stephan

  • Major Hayden says:

    Stephan -

    You're totally correct - hosting with one provider does leave you open to risk. Even if the datacenter(s) is/are running up to speed and your configuration is perfect, an error on the part of the hosting provider could bring it all down. It really comes down to how much time and money you want to spend to make your application redundant.

    Is that what you were asking?

  • Yes, good thanks!

    Did your setup handle any outages? What kind?

    Stephan

  • Major Hayden says:

    Stephan -

    I've had a couple of failures so far. The hardware node that ran my web2 slice failed and had to be replaced. I was able to run on only the web1 server until web2 was back in the rotation. I also ran into a peculiar problem where connections were being held open between the web nodes and the database nodes, but I was able to correct that pretty quickly.

  • Alex says:

    Following your articles on how to make redundant cloud hosting with rackspace cloud servers, I made some tests on glusterfs performance. The bandwidth between internal network (ServiceNet) of cloud servers is limited.

    http://cloudservers.rackspacecloud.com/index.php/Frequently_Asked_Questions#Is_there_a_throughput_limit_on_my_server.27s_network_interface_card.3F

    This is a serious limit for technologies like DRBD, GlusterFS, etc. Do you have found a solution to this problem?

  • Serg says:

    DRBD with virtual machines..

    Rackspace does not officially support it because they don't know how separate partitions will play with resizing..

    Any thoughts on this would be highly appreciated.

    Serg

  • Serg says:

    morphium -

    For example rackspace places your instances on different physical servers by default, anyway you need to check that with hosting company.

    Serg

  • simon says:

    Opening up a networked file share over Rackspace public cloud - does it not need encrption?

Trackbacks/Pingbacks

  1. Tweets that mention Redundant Cloud Hosting Guide: -- Topsy.com
  2. Redundant Cheap – Cloud Hosting « Readme.txt
  3. Best way for Linux site and Database redundancy - Admins Goodies
  4. The Official Rackspace Blog - Why Technical People Should Blog (But Don’t)

RSS feed for comments on this post , TrackBack URI

Leave a Reply

 

  • Welcome! I started this blog as a way to give back to all of the other system administrators who have taught me something in the past. Writing these posts brings me a lot of enjoyment and I hope you find the information useful. If you spot something that's incorrect or confusing, please write a comment and let me know. Drop me a line if there's something you want to know more about and I'll do my best to write a post on the topic.
    -- Major Hayden