Multi-site considerations with OTV and NSX
I am in the early design phase of a very large-scale (global) private cloud deployment. One of my design challenges is centered around multi-site availability. I need to be able to extend a layer-2 domain across multiple sites.
Eventually I want to be able to get to the point were I can orchestrate a “follow-the-sun” model where a user or business unit can instantiate a service, application, (whatever), and have the option to enable that application to migrate globally to different data centers as regions come online, and source traffic begins to shift throughout the day.
After VMWorld this year, I was very excited with the advancements made in 5.5 particularly with NSX. It seems like it will be a key enabler for my cloud solution.
Also, since I am already working with a considerably robust global infrastructure leveraging Cisco gear (Nexus 7k, 5k, UCS, etc), it seems feasible that OTV and LISP can play a key role here as well. I want to document what I am currently kicking around in my head as a potential solution.
First, I have to point out that I was quite pleased to read this article by Brad Hedlund. In fact, in it he explicitly documents the exact scenario I have been considering. His Visio design (below) looks nearly exactly like the conceptual model I created for my team several weeks ago. I feel confident that if someone like Brad has signed off on this as a potentially viable solution, then I am likely on the right track.
Anyway, instead of sanitizing my internal Visios for public viewing, I am going to borrow Brad’s for this article. Here is his design:
So you can see here that OTV and LISP are certainly doing the “heavy lifting” with regards to extending the layer-2 and providing the intelligent ingress routing. With LISP, the source traffic will automatically be driven to the DC that is currently hosting the application.
OTV is extending my “VM Network” VLan – Illustrated here as the “DMZ VLAN” – (I call this the “transport VLAN” in VXLAN deployments. Basically it is the single logical vlan that all the virtual VXLANs are traversing over). By extending this single VLAN using OTV to the other sites, I am providing NSX with a common egress gateway network at each site. The other benefit here is that I only have to OTV that one network, and all of a sudden I have multi-site capability for all of my virtual networks encapsulated on that VLAN. This of course means that as I instantiate new applications in new virtual networks on that same VLAN, they also inherit the multi-site capability without me having to go to the Networking team and have them create a new OTV session, or reconfigure anything at all really.
I still have many things to consider here. The glaringly obvious one being that a live vMotion will require a sub 20 millisecond round-trip latency. Not really an issue if I have many DCs chained around the globe, however in certain cases, we may have to get creative with WAN optimizers if we wish to handle the live migration.
Of course there is also storage replication to consider. In my case, this piece is actually already further down the design path. I am very confident in our ability to handle that piece of the puzzle. Without going into too much detail (that will require a longer discussion), I am looking have a storage topology that almost resembles a CDN. Where each piece of replicated data is stored in multiple locations at each site, and is asynchronously replicated in a full-mesh topology.
Lastly is the TCP fragmentation issue. I need to ensure that all pieces of an application are migrated successfully each time to avoid east-west traffic traversing the WAN. Not just because of the substantial performance hit with the latency between parts of the application (due to distance), but also because the 1600-byte VXLAN frame is going to get chopped up as soon as it exits its local DC. Causing a potentially massive amount of re-transmitting.
I just wanted to take the time to put that out there as something I am considering for this solution. I thought it was a pretty interesting piece of this design, and might spark some public interest. Please feel free to chime in!
** Update **
In Brad’s design he is illustrating that the DMZ vlan is strictly an egress point, and is not carrying actual VXLAN frames. Whereas I was envisioning extending the layer-2 for the “transport” vlan. IE the actual VLAN that carries the VXLAN traffic (in a “normal” SDN/VXLAN design, this vlan would only exist local to each cluster).
I still need to do some brainstorming on this part of the design to determine which method would make the most sense for our use case(s). I am still working to wrap my head around how SDN should best be implemented here. (This article is meant to simply help me with this brainstorming process).
I’ll update here when my team and I have a better idea of how this piece will be designed.