Hello Techies, This post will focus on NSX-T Disaster Recovery of the production env that I recently did for one of the customer. Post talks about my own experience and the procedure may differ as per your NSX-T design.
Here is the official VMware documentation which was referred while doing the activity.
To put the screenshots in this post, I have recreated the env in my lab. All captures in this post are from the lab that I created for testing purpose.
To set the right expectations, This DR was performed to backup and restore the Management Plane of NSX-T and not the Data Plane. Let me explain the existing env to understand the reason for doing Management Plane recovery only.
NSX-T Multisite Env
Both sites are active and configured with respective BGP routing to local Top of the Rack (TOR) switches.
Primary Site hosts the NSX-T Manager cluster
Backup of the NSX-T manager configured on SFTP server which sits at DR site.
Both sites have a vCenter, Edge VM’s and ESXi nodes.
Inter-Site link has jumbo frames enabled.
Both Sites hosts active workload. Also, Load Balancer, VPN as well as micro-segmentation is in place.
3rd Party solution is already configured to Migrate / Restart the VM’s on the DR site in case of disaster.
Since both sites are independent and have sperate EDGE VM’s and routing in place, only Management Plane needs to be restored. The 3rd party backup solution will restore the VM’s on the DR site in case of disaster.
Important Note: Data Plane (i.e. host transport nodes, edge transport nodes…) does not get affected even if you loose the NSX-T manager cluster for any reason. Routing and Connectivity to all workload VM’s works perfectly fine. In short, During the loss of Management Plane, Data Plane is still running as far you do not add any new workload. Also, keep in mind that the vMotion of any VM will end up in loosing the connectivity of that VM if it’s connected to NSX-T Overlay Network. So, it would be a good idea to disable DRS until you bring back the NSX-T manager cluster on the DR site.
The other disadvantage is you cannot make any configuration changes in NSX-T since the UI itself is not available.
Here are some additional bullet points…
You must restore to new appliances running the same version of NSX-T Data Center as the appliances that were backed up.
If you are using an NSX Manager or Global Manager IP address to restore, you must use the same IP address as in the backup.
If you are using an NSX Manager or Global Manager FQDN to restore, you must use the same FQDN as in the backup. Note that only lowercase FQDN is supported for backup and restore.
In most of the cases, FQDN is configured in the env which involves additional steps while restoring the backup. We will discuss more about it in detail. Let’s focus on configuring the backup.
Check my following post for configuring the backup for NSX-T env.
To begin this post, let’s have a look at the existing env architecture…
List of servers in the env with IP’s.
Here is the screen capture from the env…
Site A vCenter – Dubai
Site B vCenter – Singapore
As I said earlier, we are going to perform Management Plane recovery and not Data Plane, hence I did not configure edge, tier-0 etc on the Site-B. However, customer env had another edge cluster for site B and so the Tier-0. (as shown in the above diagram)
Stable NSX-T manager cluster, VIP assigned to 172.16.31.78
Dubai vCenter host transport nodes
Singapore vCenter host transport nodes
Just a single Edge Transport node deployed at primary site.
BGP Neighbors Configuration…
Note the source addresses. We should see them on TOR as neighbors.
Let’s have a look at the TOR…
Established 172.27.11.2 & 172.27.12.2 neighbors.
BGP routes on the TOR.
Let’s create a new segment and to see if the new routes appears on the TOR.
We should see 10.2.98.X BGP route on the TOR.
Perfect. We have everything in place to perform the DR test and check the connectivity once we bring the NSX-T manager cluster UP in the DR site.
That’s it for this post. We will discuss further process in the next part of this blog series.
For NSX-T 3.1, following are supported operating systems as per the VMware documentation, however it also says that other software versions might work. SFTP is the only supported protocol for now.
I remember having discussion with someone using VMware Photon OS for NSX-T backup. It is based on Linux OS and lightweight too. Does not consume many resources. Available for download at following location…
Add a group… root@VirtualRove [ ~ ]# groupadd bkpadmin
Add a user in the group… root@VirtualRove [ ~ ]# groupmems -g bkpadmin -a siteA
Set the password for user root@VirtualRove [ ~ ]# passwd siteA
New password: Retype new password:
passwd: password updated successfully
The chown command changes user ownership of a file, directory, or link in Linux chown USER:[GROUP NAME] [Directory Path] root@VirtualRove [ ~ ]# chown siteB:bkpadmin /home/nsxbkp-siteB/ root@VirtualRove [ ~ ]#
And that completes the configuration on Photon OS. We are good to configure this directory as backup directory in NSX-T.
Couple of things… The Photon OS is not enabled for ICMP ping by default. You must run following commands on the console to enable ping. iptables -A OUTPUT -p icmp -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT
Also, Root account is not permitted to login by default. You need to edit the ‘sshd_config’ file located at ‘/etc/ssh/sshd_config’ You can use any editor to edit this file…
Scroll it to the end of the file and change following value to enable ssh for root account…
Change the value from ‘no’ to ‘yes’ and save the file. You should be able to SSH to photon OS.
Let’s move to NSX-T side configuration. Login to NSX-T VIP and navigate to System> Backup & Restore…
Click on Edit for SFTP server and fill in all required information. FQDN/IP : is your sftp server Port : 22 Path : We created this in our above steps. Username, Password & Passphrase.
It will prompt to add for Fingerprints.
Click on ‘Start Backup’ once you save it.
You should see successful backup listed in the UI.
Additionally, you can use WinSCP to login to photon and check for backup directory. You should see recent backup folders.
You also want to set an interval to backup NSX-T configuration as pe the mentioned schedule. Click on ‘Edit’ from NSX-T UI backup page and set an interval.
I preferred everyday backup, so I set it up to 24 hrs interval.
Check your manager cluster to make sure its stable.
And take a backup again manually.
That’s it for this post.
We have successfully configured SFTP server for our NSX-T environment. We will use this backup to restore it at DR site in case of site failure or in case of NSX-T manager failure for any reason.