Hello Techies, This post will focus on NSX-T Disaster Recovery of the production env that I recently did for one of the customer. Post talks about my own experience and the procedure may differ as per your NSX-T design.
Here is the official VMware documentation which was referred while doing the activity.
To put the screenshots in this post, I have recreated the env in my lab. All captures in this post are from the lab that I created for testing purpose.
To set the right expectations, This DR was performed to backup and restore the Management Plane of NSX-T and not the Data Plane. Let me explain the existing env to understand the reason for doing Management Plane recovery only.
NSX-T Multisite Env
Both sites are active and configured with respective BGP routing to local Top of the Rack (TOR) switches.
Primary Site hosts the NSX-T Manager cluster
Backup of the NSX-T manager configured on SFTP server which sits at DR site.
Both sites have a vCenter, Edge VM’s and ESXi nodes.
Inter-Site link has jumbo frames enabled.
Both Sites hosts active workload. Also, Load Balancer, VPN as well as micro-segmentation is in place.
3rd Party solution is already configured to Migrate / Restart the VM’s on the DR site in case of disaster.
Since both sites are independent and have sperate EDGE VM’s and routing in place, only Management Plane needs to be restored. The 3rd party backup solution will restore the VM’s on the DR site in case of disaster.
Important Note: Data Plane (i.e. host transport nodes, edge transport nodes…) does not get affected even if you loose the NSX-T manager cluster for any reason. Routing and Connectivity to all workload VM’s works perfectly fine. In short, During the loss of Management Plane, Data Plane is still running as far you do not add any new workload. Also, keep in mind that the vMotion of any VM will end up in loosing the connectivity of that VM if it’s connected to NSX-T Overlay Network. So, it would be a good idea to disable DRS until you bring back the NSX-T manager cluster on the DR site.
The other disadvantage is you cannot make any configuration changes in NSX-T since the UI itself is not available.
Here are some additional bullet points…
You must restore to new appliances running the same version of NSX-T Data Center as the appliances that were backed up.
If you are using an NSX Manager or Global Manager IP address to restore, you must use the same IP address as in the backup.
If you are using an NSX Manager or Global Manager FQDN to restore, you must use the same FQDN as in the backup. Note that only lowercase FQDN is supported for backup and restore.
In most of the cases, FQDN is configured in the env which involves additional steps while restoring the backup. We will discuss more about it in detail. Let’s focus on configuring the backup.
Check my following post for configuring the backup for NSX-T env.
To begin this post, let’s have a look at the existing env architecture…
List of servers in the env with IP’s.
Here is the screen capture from the env…
Site A vCenter – Dubai
Site B vCenter – Singapore
As I said earlier, we are going to perform Management Plane recovery and not Data Plane, hence I did not configure edge, tier-0 etc on the Site-B. However, customer env had another edge cluster for site B and so the Tier-0. (as shown in the above diagram)
Stable NSX-T manager cluster, VIP assigned to 172.16.31.78
Dubai vCenter host transport nodes
Singapore vCenter host transport nodes
Just a single Edge Transport node deployed at primary site.
BGP Neighbors Configuration…
Note the source addresses. We should see them on TOR as neighbors.
Let’s have a look at the TOR…
Established 172.27.11.2 & 172.27.12.2 neighbors.
BGP routes on the TOR.
Let’s create a new segment and to see if the new routes appears on the TOR.
We should see 10.2.98.X BGP route on the TOR.
Perfect. We have everything in place to perform the DR test and check the connectivity once we bring the NSX-T manager cluster UP in the DR site.
That’s it for this post. We will discuss further process in the next part of this blog series.
For NSX-T 3.1, following are supported operating systems as per the VMware documentation, however it also says that other software versions might work. SFTP is the only supported protocol for now.
I remember having discussion with someone using VMware Photon OS for NSX-T backup. It is based on Linux OS and lightweight too. Does not consume many resources. Available for download at following location…
Add a group… root@VirtualRove [ ~ ]# groupadd bkpadmin
Add a user in the group… root@VirtualRove [ ~ ]# groupmems -g bkpadmin -a siteA
Set the password for user root@VirtualRove [ ~ ]# passwd siteA
New password: Retype new password:
passwd: password updated successfully
The chown command changes user ownership of a file, directory, or link in Linux chown USER:[GROUP NAME] [Directory Path] root@VirtualRove [ ~ ]# chown siteB:bkpadmin /home/nsxbkp-siteB/ root@VirtualRove [ ~ ]#
And that completes the configuration on Photon OS. We are good to configure this directory as backup directory in NSX-T.
Couple of things… The Photon OS is not enabled for ICMP ping by default. You must run following commands on the console to enable ping. iptables -A OUTPUT -p icmp -j ACCEPT iptables -A INPUT -p icmp -j ACCEPT
Also, Root account is not permitted to login by default. You need to edit the ‘sshd_config’ file located at ‘/etc/ssh/sshd_config’ You can use any editor to edit this file…
Scroll it to the end of the file and change following value to enable ssh for root account…
Change the value from ‘no’ to ‘yes’ and save the file. You should be able to SSH to photon OS.
Let’s move to NSX-T side configuration. Login to NSX-T VIP and navigate to System> Backup & Restore…
Click on Edit for SFTP server and fill in all required information. FQDN/IP : is your sftp server Port : 22 Path : We created this in our above steps. Username, Password & Passphrase.
It will prompt to add for Fingerprints.
Click on ‘Start Backup’ once you save it.
You should see successful backup listed in the UI.
Additionally, you can use WinSCP to login to photon and check for backup directory. You should see recent backup folders.
You also want to set an interval to backup NSX-T configuration as pe the mentioned schedule. Click on ‘Edit’ from NSX-T UI backup page and set an interval.
I preferred everyday backup, so I set it up to 24 hrs interval.
Check your manager cluster to make sure its stable.
And take a backup again manually.
That’s it for this post.
We have successfully configured SFTP server for our NSX-T environment. We will use this backup to restore it at DR site in case of site failure or in case of NSX-T manager failure for any reason.
We often get into the situation where we need to apply VMware licenses to test VMware products. And most of them are not aware that, VMware provides 60 days trial period for most of the products. For, VMware vCenter Server & ESXi gets the default trial period as soon as you install it. However, for some products you have to ask VMware to provide evaluation license key to test the product. It is easy and simple to get trial licenses for VMware Products.
Check the following link to know the products that are available for trial period license.
You might encounter ‘no healthy upstream’ error message on newly installed vCenter. This is because of some unexpected parameters while deploying vCenter. I was not able to find the exact root cause for this error, however I knew the resolution from our discussion with technical experts long back.
To start with, here is how it looks on the vCenter when you try to access web client.
You will be able to access vCenter Server Appliance Management Interface at port 5480. Check the services here…
All services show as healthy. In fact, on the summary page shows health status as Good.
Everything looks fine but you can not access web client. I tried restarting vCenter server multiple times with no luck. Tried restarting all services from management interface. Nothing works.
Solution is to change network settings from vCenter Server Appliance Management Interface.
Click on networking & expand nic0. Notice that the IP address shows as DHCP even if it was given as static.
Click on Edit at top right corner to edit the network settings & select your nic.
Expand the nic0 here and notice that IPv4 shows automatic. Change this to manual.
Provide credentials on the next page.
Acknowledge the change. Take backup of vCenter if necessary. It also recommends to unregister extensions before you save this.
Also, check the next steps after settings are saved successfully.
Click on Finish and you should see the progress.
Access web client once this is finished. You should be able to access it.
Go back to vCenter Server Appliance Management Interface to verify. You should the ip as static.
The issue has been resolved. This is definitely because of some unexpected parameters while deploying the vCenter, since it does not show this error for every deployment. Anyways, wanted to write small blog on it to help techies to resolve the issues just in case if anyone see this error. Thank you for reading.
Its time to start the actual deployment. We will resolve the issues as we move on. Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.
Upload the file and Next. I got an error here.
Bad Request: Invalid input DNS Domain must match
Figured out to be an additional space in DNS Zone Name here.
This was corrected. Updated the sheet and NEXT.
All good. Validation process started.
To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.
Let’s continue with the deployment…
“Error connecting to ESXi host esxi01. SSL Certificate common name doesn’t match ESXi FQDN”
Look at the “vcf-bringup.log” file.
This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates
You can see here, Even if the hostname on the top shows “esxi01.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.
SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.
esxcli system hostname set -H=esxi03 esxcli system hostname set -f=esxi03.virtualrove.local cd /etc/vmware/ssl /sbin/generate-certificates /etc/init.d/hostd restart && /etc/init.d/vpxa restart Reboot
You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.
Verify the hostname in the cert once server boots up.
Next, Hit retry on cloud builder, and we should be good.
I am not sure why this showed up. I was able to reach to these IP’s from “Cloud Builder”.
Anyways, this was warning, and it can be ignored.
Next one was with host tep and edge tep.
VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi02.virtualrove.local’ failed VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) on host ‘esxi02.virtualrove.local’ failed VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi02.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed VM Kernel ping from IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) from host ‘esxi02.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) on host ‘esxi03.virtualrove.local’ failed VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) on host ‘esxi03.virtualrove.local’ failed VM Kernel ping from IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) from host ‘esxi03.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed VM Kernel ping from IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) from host ‘esxi03.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed
First of all, I failed to understand APIPA 169.254.X.X. We had mentioned VLAN 1634 for Host TEP. It should have picked an ip address 172.16.34.X. This VLAN was already in place on TOR and I was able to ping the GW of it from CB. I took a chance here and ignored it since it was a warning.
Next, got warnings for NTP.
Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds Host esxi01.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local
For ESXi, Restart of ntpd.service resolved issue. For CB, I had to sync the time manually.
Steps to manually sync NTP…
ntpq -p systemctl stop ntpd.service ntpdate 172.16.31.110 Wait for a min and again run this ntpdate 172.16.31.110 systemctl start ntpd.service systemctl restart ntpd.service ntpq -p
verify the offset again. It must be closer to 0.
Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊
This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing. Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.
Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.
Next, You will get this window once you resolve all errors.
Click on “Deploy SDDC”.
Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.
So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊
Let the game begin…
Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.
Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.
Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.
All looks good here. Till this point I had vCenter in place and it was deploying first NSX-T ova.
Glance at the NSX-T env.
Note that the TEP ip’s for host are from the vlan 1634. However, CB validation stage was picking up apipa.
NSX-T was fine. It moved to SDDC further.
Woo, Bring-up moved to post deployment task.
Moved to AVN (Application Virtual Networking). I am expecting some errors here.
“A problem has occurred on the server. Please retry or contact the service provider and provide the reference token. Unable to create logical tier-1 gateway (0)”
This was easy one. vcf-bringup.log showed that it was due to missing DNS record for edge vm. Created DNS record and retry.
“Failed to validate BGP Neighbor Perring Status for edge node 172.16.31.125”
Let’s look at the log file.
Time to check NSX-T env.
Tier-0 gateway Interfaces looks good as per out deployment parameters.
However, BGP Neighbors are down.
This was expected since we haven’t done the BGP configuration on TOR (VyOS) yet. Let’s get in to VyOS and run some commands.
set protocols bgp 65001 parameters router-id 172.27.11.253 This command specifies the router-ID. If router ID is not specified it will use the highest interface IP address.
set protocols bgp 65001 neighbor 172.27.11.2 update-source eth4 Specify the IPv4 source address to use for the BGP session to this neighbor, may be specified as either an IPv4 address directly or as an interface name.
set protocols bgp 65001 neighbor 172.27.11.2 remote-as ‘65003’ This command creates a new neighbor whose remote-as is <nasn>. The neighbor address can be an IPv4 address or an IPv6 address or an interface to use for the connection. The command is applicable for peer and peer group.
set protocols bgp 65001 neighbor 172.27.11.3 remote-as ‘65003’ set protocols bgp 65001 neighbor 172.27.11.2 password VMw@re1! set protocols bgp 65001 neighbor 172.27.11.3 password VMw@re1!
TOR configuration done for 2711 vlan. Let’s refresh and check the bgp status in nsx-t.
Same configuration to be performed for 2nd VLAN. I am using same VyOS for both the vlans since it’s a lab env. Usually, You will have 2 TOR’s and each BGP peer vlan configured respectively for redundancy purpose.
set protocols bgp 65001 parameters router-id 172.27.12.253 set protocols bgp 65001 neighbor 172.27.12.2 update-source eth5 set protocols bgp 65001 neighbor 172.27.12.2 remote-as ‘65003’ set protocols bgp 65001 neighbor 172.27.12.3 remote-as ‘65003’ set protocols bgp 65001 neighbor 172.27.12.2 password VMw@re1! set protocols bgp 65001 neighbor 172.27.12.3 password VMw@re1!
Both BGP Neighbors are successful.
Hit Retry on CB and it should pass that phase.
Next Error on Cloud Builder: ‘Failed to validate BGP route distribution.’
At this stage, routing has been configured in your NSX-T environment, both edges have been deployed and BGP peering has been done. If you check bgp peer information on edge as well as VyOS router, it will show ‘established’ and even routes from NSX-T environment appears on your VyOS router. Which means, route redistribution from NSX to VyOS works fine and this error means that there are no routes advertised from VyOS (TOR) to NSX environment. Let’s get into VyOS and run some commands.
set protocols bgp 65001 address-family ipv4-unicast network 172.16.31.0/24 set protocols bgp 65001 address-family ipv4-unicast network 172.16.32.0/24
Retry on CB and you should be good.
Everything went smoothly after this. SDDC was deployed successfully.
That was fun. We have successfully deployed vCloud Foundation version 4.2.1 including AVN (Application Virtual Networking).
Time to verify and check the components that have been installed.
Segments in NSX-T which was specified in deployment parameters sheet.
Verify on the TOR (VyOS) if you see these segments as BGP published networks.
Added a test segment called “virtaulrove_overlay_172.16.50.0” in nsx-t to check if the newly created network gets published to TOR.
All looks good. I see the new segment subnet populated on TOR.
Let’s do some testing. As you see above, new segment subnets are being learned from 172.27.11.2 this interface is configured on edge01 VM. Check it here.
We will take down edge01 VM to see if route learning changes to edge02.
Get into nodes on nsx-t and “Enter NSX Maintenance mode” for edge 01 VM.
Edge01, Tunnels & Status down.
Notice that the gateway address has been failed over to 172.27.11.3.
All Fine, All Good. 😊
There are multiple tests that can be performed to check if the deployed environment is redundant at every level.
Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.
For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.
You should have enjoyed reading the post. It’s time for you to get started and deploy VCF. See you in future posts. Feel free to comment below if you face any issues when you deploy the VCF environment.
We have prepared the environment for VCF deployment. Its time to move to CB and discuss the “Deployment Parameters” excel sheet in detail. You can find my earlier blog here.
Login to Cloud Builder VM and start the deployment process.
Select “vCloud Foundation” here,
The other option “Dell EMC VxRail” to be used when your physical hardware vendor is Dell.
VxRail is hyper-converged appliance. It’s a single device which includes compute, storage, networking and virtualization resources. It comes with pre-configured vCenter and esxi servers. Then there is a manual process to convert this embedded vCenter into user manage vCenter, and that’s when we use this option. If possible, I will write a small blog on it too.
Read all prereqs on this page and make sure to fulfill them before you proceed.
Click on “Download” here to get the “Deployment Parameter” excel sheet.
Let’s dig into this sheet and talk in detail about all the parameters here.
“Prerequisites Checklist” sheet from the deployment parameter. Check all line items one by one and select “Verified” in the status column. This does not affect anywhere; it is just for your reference.
“Management Workloads” sheet.
Place your license keys here.
This sheet also has compute resource calculator for management workload domain. Have a look and try to fit your requirements accordingly.
“Users and Groups”: Define all passwords here. Check out the NSX-T passwords, as the validation fails if it does not match the password policy.
Moving on to next sheet “Hosts and Networks”.
Couple of things to discuss here,
DHCP requirement for NSX-T Host TEP is optional now. It can be defined manually with static IP pools here. However, if you select NO, then DHCP option is still valid.
Moving onto “vSphere Distributed Switch Profile” in this sheet. It has 3 profiles. Earlier VCF version had only one option to deploy with 2 pnics only. Due to high demand from customer to deploy with 4 pnics, this option was introduced. Let’s talk about this option.
This profile will deploy a single vDS with 2 or 4 uplinks. All network traffic will flow through the assigned nics in this vDS. Define the name and pNICs at row # 17,18 respectively.
This one deploys 2 VDS. You can see that the first vDS will carry management traffic and the other one is for NSX. Each vDS can have 2 or 4 pnics.
This one also deploys 2 vDS, just that the VSAN traffic is segregated instead of NSX in earlier case.
Select the profile as per your business requirement and move to next step.
Next – “Deploy Parameters”
Define all parameters here carefully. If something is not good, the cell would turn RED. I have selected VCSA size as small since we are testing the product.
Move to NSX-T section. Have a look at the AVN (Application Virtual Networking). If you select Yes here, then you must specify the BGP peering information and uplinks configuration. If it’s NO, then it does not do BGP peering.
TOR1 & TOR2 IPs interfaces configured on your VyOS. Make sure to create those interfaces. We will see it in detail when we reach to that level in the deployment phase.
We are all set to upload this “Deployment Parameter” sheet to Cloud Builder and begin the deployment. That is all for this blog. We will do the actual deployment in next blog.
Finally, after a year and half, I got a chance to deploy latest version of vCloud Foundation 4.2.1. It has been successfully deployed and tested. I have written couple blogs on earlier version (i.e. version 4.0), you can find them here.
Let’s discuss and understand the installation flow,
Configure TOR for the networks that are being used by VCF. In our case, we have VyOS router. Deploy a Cloud Builder VM on stand alone source ESXi or vCenter. Install and Configure 4 ESXi Servers as per the pre-reques. Fill the Deployment Parameters excel sheet carefully. Upload “Deployment Parameter” excel sheet to Cloud Builder. Resolve the issues / warning shown on the validation page of CB. Start the deployment. Post deployment, you will have a vCenter, 4 ESXi servers, NSX-T env & SDDC manager deployed. Additionally, you can deploy VI workload domain using SDDC manager. This will allow you to deploy Kubernetes cluster. Also, vRealize Suite & Workspace ONE can be deployed using SDDC manager.
You definitely need huge amount of compute resources to deploy this solution. This entire solution was installed on a single ESXi server. Following is the configuration of the server.
Dell PowerEdge R630 2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz 256 GB Memory 4 TB SSD
Let’s prepare the infra for VMware vCloud Foundation.
I will call my physical esxi server as a base esxi in this blog. So, here is my base esxi and VM’s installed on it.
dc.virtaulrove.local – This is a Domain Controller & DNS Server in the env. VyOS – This virtual router will act as a TOR for VCF env. jumpbox.virtaulrove.local – To connect to the env. ESXi01 to ESXi 04 – These will be the target ESXi’s for our VCF deployment. cb.virtaulrove.local – Cloud Builder VM to deploy VCF.
Here is a look at the TOR and interfaces configured…
Follow my blog here to configure the VyOS TOR.
Network Requirements: Management domain networks to be in place on physical switch (TOR). Jumbo frames (MTU 9000) are recommended on all VLANs or minimum of 1600 MTU.
And a VLAN 1634 for Host TEP’s, which is already configured on TOR at eth3.
Following DNS records to be in place before we start with the installation.
With all these things in place, out first step is to deploy 4 target ESXi servers. Download the correct supported esxi version ISO from VMware downloads.
7.0 Update 1d
If you check VMware downloads page, this version is not available for download.
Release notes says, create a custom image to use it for deployment. However, there is another way to download this version of ESXi image. Let’s get the Cloud Builder image from VMware portal and install it. We will keep ESXi installation on hold for now.
We start the Cloud Builder deployment once this 19 GB ova file is downloaded.
Cloud Builder Deployment:
Cloud Builder is an appliance provided by VMware to build VCF env on target ESXi’s. It is one time use VM and can be powered off after the successful deployment of VCF management domain. After deployment, we will use SDDC manager for managing additional VI domains. I will be deploying this appliance in VLAN 1631, so that it gets access to DC and all our target ESXi servers.
Deployment is straight forward like any other ova deployment. Make sure to you choose right password while deploying the ova. The admin & root password must be a minimum of 8 characters and include at least one uppercase, one lowercase, one digit, and one special character. If this does not meet, then the deployment will fail which results in re-deploying ova.
Once the deployment is complete. Connect to CB using winscp and navigate to ….
Click on Download to use this image to deploy our 4 target ESXi servers.
Next step is to create 4 new VM’s on base physical ESXi. These will be our nested ESXi where our VCF env will get install. All ESXi should have identical configuration. I have following configuration in my lab.
vCPU: 12 2 Sockets, 6 cores each. CPU hot plug: Enabled Hardware Virtualization: Enabled
And 2 network cards attached to Trun_4095. This will allow an esxi to communicate with all networks on the TOR.
Map the ISO to CD drive and start the installation.
I am not going to show ESXi installation steps, since most of you know it already. Let’s look at the custom settings after the installation.
DCUI VLAN settings should be set to 1631.
Crosscheck the DNS and IP settings on esxi.
And finally, make sure that the ‘Test Management Network’ on DCUI shows OK for all tests.
Repeat this for all 4 esxi.
I have all my 4 target esxi severs ready. Let’s look at the ESXi configuration that has to be in place before we can utilize them for VCF deployment.
All ESXi must have ‘VM network’ and ‘Management network’ VLAN id 1631 configured. NTP server address should be in place on all ESXi. SSH & NTP service to be enabled and policy set to ‘Start & Stop with the host’ All additional disks to be present on an ESXi as a SSD and ready for VSAN configuration. You can check it here.
If your base ESXi has HDD and not SSD, then you can use following command to mark those HDD to SSD.
You can either connect to DC and putty to ESXi or open ESXi console and run these commands.
Here is a small writeup on resetting the root account password for vCenter / Cloud Builder VM.
I was deploying VCF enf and the root account for Cloud Builder account got locked out. So thought of writing a small blog on it. You can find my VCF posts here. https://virtualrove.com/vcf/
Let’s get started.
Reboot the “Cloud Builder” VM / vCenter and press the ‘e’ key to enter the GNU GRUB Edit Menu. Locate the 3rd row which starts with ‘linux’ at the beginning.
Write following at the end of the line and press F10
You will get to command line after this. Run following commands.
mount -o remount,rw / ::: To mount the root partition passwd ::: To enter new password, then re-type new password. /sbin/pam_tally2 –user=root –reset ::: To unlock the locked root account. Run this twice. umount / ::: To unmount the partition reboot -f ::: Reboot the VM
Thats it. You should be able to get in with the new password for root account after the reboot. Short & Simple. 😊
To become an expert in VMware Virtualization, reading blogs is definitely the right way to enhance your knowledge. However, you will not get real feel of it unless & until you try it out yourself. You get real-time production feel when you do hands on and when you resolve those unexpected issues by yourself. VMware Workstation is best way to install, configure and try new products, experiment labs as far as you have good amount hardware configuration (i.e. memory, storage & cpu).
Huge amount of compute resources (i.e. memory, storage & cpu) needed for VMware labs is one of the barrier for most of the them and this sometime becomes an obstacle for an individual. To resolve this, we have put together complete lab solutions for users who wants to learn VMware virtualization and explore it by implementing it. We have huge amount of compute resources to be rented out, which can be used for any kind of labs. You will be provided requested amount of Memory, Storage & CPUs to do the labs, which will be accessible from Anywhere, Anytime. Additionally, will assist you to setup the lab.
At very minimal cost, we will provide lab setup, assistance & 24*7 support. Our labs will be able to accommodate following VMware products, which can be deployed and tested multiple times with real time experience. Here is the list of labs followed by the certification which you will be able to achieve.
Our labs are not limited to above products. It is equipped with huge configuration which will even help you to do POC’s for following VMware products before you implement it in your production environment. All required assistance & guidance will be provided to setup these labs.
We have our own SOP’s (Standard Operating Procedures) to build the env. Connect with us to get lab experience as if you have build customer production env. For demo, We provide 2 hours lab access to new user without any cost.
Connect with us through the contact form using below link OR send an email to email@example.com
In this post, we will talk about the reverse migration of VMKernel adaptor from NSX-T to back to vCenter port group. If you did not get chance to look at my previous article on migration of vmk from vCenter to NSX-T, here is the link.
There can be multiple reasons for removing VMKernel ports from NSX-T. Here are some… The third party application which had vmk in vCenter PG does not behave as expected, The application itself (which uses vmk) is no longer needed, You want to uninstall NSX-T from one of the host for any reason, you first have to move vmk’s from it or appropriate “Network Mappings for Uninstall” has to be in place before you move on.
Note: Uninstalling NSX-T Data Center from an ESXi host is disruptive if the physical interfaces or VMkernel interfaces are connected to N-VDS.
Here is one more important scenario mentioned at VMware docs, (copied from VMware site)
Transport node configuration on a node cannot be overriden if underlying segments or VMs are connected to that transport node. For example, consider a two ESXi host cluster, where host-1 is configured as transport-node-1, but host-2 is unprepared. Segments and VMs are connected to transport-node-1. After preparing host-1 as a transport node (associated to transport-zone-1), if you apply a transport node profile to that cluster (associated to transport-zone-2), then NSX-T does not override the transport node configuration with the transport node profile configuration. To successfully override configuration on host-1, power off the VMs and disconnect the segment before applying the transport node profile to associate host-1 to transport-zone-2 and disassociate it from transport-zone-1.
With that lets get started,
In previous post, I explained migration process of vmk from vCenter to NSX-T. Lets get started with reverting it back. Verify the vmk location. It is on nsx-t logical switch “VLAN-1650” and the switch name is ‘data-nvds’
Back to NSX-T > System> Nodes> Select appropriate node> Action> ‘Migrate ESX Vmkernel and Physical Adaptors’
Select ‘Migrate to port group’ in this wizard.
Direction: Migrate to Port Groups N-VDS: Select the target switch from where you want to remove vmk port. Select the VMkernel Adapter (vmk3) and manually type the port group name ‘vDS-Test-1650’
Map the appropriate physical nics and uplinks in this wizard. Note: Mapping physical nics here does not mean that it will remove mentioned nics from N-VDS.
Verify that the vmk3 is back to the VDS.
Test the connectivity to vmk from esxi. And we are done with the reverse migration of VMkernel port to the port group. That’s it for this post. Will come back soon with new content for my next blog.