NSX-T 3.1 – Backup & Restore_Production DR Experience – Part1

Hello Techies, This post will focus on NSX-T Disaster Recovery of the production env that I recently did for one of the customer. Post talks about my own experience and the procedure may differ as per your NSX-T design.

Here is the official VMware documentation which was referred while doing the activity.

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/administration/GUID-A0B3667C-FB7D-413F-816D-019BFAD81AC5.html

Additionally, following document is MUST to go through before you plan your DR.

https://communities.vmware.com/t5/VMware-NSX-Documents/NSX-T-Multisite/ta-p/2771370

To put the screenshots in this post, I have recreated the env in my lab. All captures in this post are from the lab that I created for testing purpose.

To set the right expectations, This DR was performed to backup and restore the Management Plane of NSX-T and not the Data Plane. Let me explain the existing env to understand the reason for doing Management Plane recovery only.

  • NSX-T Multisite Env
  • Both sites are active and configured with respective BGP routing to local Top of the Rack (TOR) switches.
  • Primary Site hosts the NSX-T Manager cluster
  • Backup of the NSX-T manager configured on SFTP server which sits at DR site.
  • Both sites have a vCenter, Edge VM’s and ESXi nodes.
  • Inter-Site link has jumbo frames enabled.
  • Both Sites hosts active workload. Also, Load Balancer, VPN as well as micro-segmentation is in place.
  • 3rd Party solution is already configured to Migrate / Restart the VM’s on the DR site in case of disaster.

Since both sites are independent and have sperate EDGE VM’s and routing in place, only Management Plane needs to be restored. The 3rd party backup solution will restore the VM’s on the DR site in case of disaster.

Important Note: Data Plane (i.e. host transport nodes, edge transport nodes…) does not get affected even if you loose the NSX-T manager cluster for any reason. Routing and Connectivity to all workload VM’s works perfectly fine. In short, During the loss of Management Plane, Data Plane is still running as far you do not add any new workload. Also, keep in mind that the vMotion of any VM will end up in loosing the connectivity of that VM if it’s connected to NSX-T Overlay Network. So, it would be a good idea to disable DRS until you bring back the NSX-T manager cluster on the DR site.

The other disadvantage is you cannot make any configuration changes in NSX-T since the UI itself is not available.

Here are some additional bullet points…

  • You must restore to new appliances running the same version of NSX-T Data Center as the appliances that were backed up.
  • If you are using an NSX Manager or Global Manager IP address to restore, you must use the same IP address as in the backup.
  • If you are using an NSX Manager or Global Manager FQDN to restore, you must use the same FQDN as in the backup. Note that only lowercase FQDN is supported for backup and restore.

In most of the cases, FQDN is configured in the env which involves additional steps while restoring the backup. We will discuss more about it in detail. Let’s focus on configuring the backup.

Check my following post for configuring the backup for NSX-T env.

NSX-T Backup Configuration on VMware Photos OS

To begin this post, let’s have a look at the existing env architecture…

List of servers in the env with IP’s.

Here is the screen capture from the env…

Site A vCenter – Dubai

Site B vCenter – Singapore

As I said earlier, we are going to perform Management Plane recovery and not Data Plane, hence I did not configure edge, tier-0 etc on the Site-B. However, customer env had another edge cluster for site B and so the Tier-0. (as shown in the above diagram)

Stable NSX-T manager cluster, VIP assigned to 172.16.31.78

Dubai vCenter host transport nodes

Singapore vCenter host transport nodes

Just a single Edge Transport node deployed at primary site.

BGP Neighbors Configuration…

Note the source addresses. We should see them on TOR as neighbors.

Let’s have a look at the TOR…

Established 172.27.11.2 & 172.27.12.2 neighbors.

BGP routes on the TOR.

Let’s create a new segment and to see if the new routes appears on the TOR.

We should see 10.2.98.X BGP route on the TOR.

Perfect. We have everything in place to perform the DR test and check the connectivity once we bring the NSX-T manager cluster UP in the DR site.

That’s it for this post. We will discuss further process in the next part of this blog series.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in box below to receive notification on my new blogs.