Pages Menu

Popping Clouds

A blog on virtualization and cloud computing

Categories Menu

Posted by on Sep 21, 2013 in Networking | 5 comments

Multi-site considerations with OTV and NSX

I am in the early design phase of a very large-scale (global) private cloud deployment. One of my design challenges is centered around multi-site availability. I need to be able to extend a layer-2 domain across multiple sites.

Eventually I want to be able to get to the point were I can orchestrate a “follow-the-sun” model where a user or business unit can instantiate a service, application, (whatever), and have the option to enable that application to migrate globally to different data centers as regions come online, and source traffic begins to shift throughout the day.

After VMWorld this year, I was very excited with the advancements made in 5.5 particularly with NSX. It seems like it will be a key enabler for my cloud solution.

Also, since I am already working with a considerably robust global infrastructure leveraging Cisco gear (Nexus 7k, 5k, UCS, etc), it seems feasible that OTV and LISP can play a key role here as well. I want to document what I am currently kicking around in my head as a potential solution.

First, I have to point out that I was quite pleased to read this article by Brad Hedlund. In fact, in it he explicitly documents the exact scenario I have been considering. His Visio design (below) looks nearly exactly like the conceptual model I created for my team several weeks ago.  I feel confident that if someone like Brad has signed off on this as a potentially viable solution, then I am likely on the right track. :)

Anyway, instead of sanitizing my internal Visios for public viewing, I am going to borrow Brad’s for this article. Here is his design:

Screen Shot 2013-09-21 at 12.05.13 PM


So you can see here that OTV and LISP are certainly doing the “heavy lifting” with regards to extending the layer-2 and providing the intelligent ingress routing. With LISP, the source traffic will automatically be driven to the DC that is currently hosting the application.

OTV is extending my “VM Network” VLan – Illustrated here as the “DMZ VLAN” – (I call this the “transport VLAN” in VXLAN deployments. Basically it is the single logical vlan that all the virtual VXLANs are traversing over). By extending this single VLAN using OTV to the other sites, I am providing NSX with a common egress gateway network at each site. The other benefit here is that I only have to OTV that one network, and all of a sudden I have multi-site capability for all of my virtual networks encapsulated on that VLAN. This of course means that as I instantiate new applications in new virtual networks on that same VLAN, they also inherit the multi-site capability without me having to go to the Networking team and have them create a new OTV session, or reconfigure anything at all really.

I still have many things to consider here. The glaringly obvious one being that a live vMotion will require a sub 20 millisecond round-trip latency. Not really an issue if I have many DCs chained around the globe, however in certain cases, we may have to get creative with WAN optimizers if we wish to handle the live migration.

Of course there is also storage replication to consider. In my case, this piece is actually already further down the design path. I am very confident in our ability to handle that piece of the puzzle. Without going into too much detail (that will require a longer discussion), I am looking have a storage topology that almost resembles a CDN. Where each piece of replicated data is stored in multiple locations at each site, and is asynchronously replicated in a full-mesh topology.

Lastly is the TCP fragmentation issue. I need to ensure that all pieces of an application are migrated successfully each time to avoid east-west traffic traversing the WAN. Not just because of the substantial performance hit with the latency between parts of the application (due to distance), but also because the 1600-byte VXLAN frame is going to get chopped up as soon as it exits its local DC. Causing a potentially massive amount of re-transmitting.

I just wanted to take the time to put that out there as something I am considering for this solution. I thought it was a pretty interesting piece of this design, and might spark some public interest. Please feel free to chime in!


** Update **

In Brad’s design he is illustrating that the DMZ vlan is strictly an egress point, and is not carrying actual VXLAN frames.  Whereas I was envisioning extending the layer-2 for the “transport” vlan. IE the actual VLAN that carries the VXLAN traffic (in a “normal” SDN/VXLAN design, this vlan would only exist local to each cluster).

I still need to do some  brainstorming on this part of the design to determine which method would make the most sense for our use case(s). I am still working to wrap my head around how SDN should best be implemented here. (This article is meant to simply help me with this brainstorming process).

I’ll update here when my team and I have a better idea of how this piece will be designed.



  1. I’m wondering about relative costs. You need a hefty amount of bandwidth to do VMotion for vApp groups of apps. Even with heavy user traffic, I’d suspect it would require far less peak bandwidth. Yet I presume you’ve done the math. Care to share the numbers?

  2. Hi Pete,
    Do you mean numbers in terms of bandwidth, or an actual dollar figure on cost? We currently maintain a substantial pipe between all of our Data Centers for storage replication traffic. That piece is already occurring. vMotion on top of this would add the transfer of active memory pages, but that will be a minimal increase in relation to the storage replication.

    The amount of bandwidth varies by region and endpoint (IE There is less of a pipe between France and our NE datacenter than there is between the NE data center and a DC in say Chicago).

    Because of that we have to get creative with how we design the replication topology, but currently we have a sufficient working model. Now one thing that we need to be careful of (as you point out), is that in a true IaaS/PaaS environment where users can spin up apps independently of manual IT controls, we can certainly run into a situation where we all of the sudden have a bandwidth bottleneck because users have spun up a large number of applications in a specific region.

    The network capacity planning piece of this design will be critical for that exact reason. I don’t yet have a sufficient answer for how to handle that part.

  3. As far as I know VXLAN is not compatible with OTV.

    • Hi Nic,

      You are correct. In this design, our “VXLAN-land” ends behind the virtual edge device (NSX edge services gateway for instance). Where OTV comes in, is on that VLAN that the edge device connects to on its external side. By making that vlan consistent at both sites, the mobility of that vAPP becomes much easier.

      • OK so this is a little misleading because when you say “the VLAN carrying the VXLAN”, or the “transport VLAN” , it’s sounds like you’re talking about the outer tag of the VXLAN frame, which wouldn’t work. Maybe you should just say the “VXLAN mapped VLAN”? It would make more sense (at least to me :), especially in Cisco environment, cf Nexus 9000 terminology .

Leave a Reply