VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

We have covered the entire VCF 5.X stack deployment in my earlier 3 blogs.

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

It’s time to check VCF environment and do some post checks.
Here is the SDDC manager after the deployment,

Host & Clusters view,

VM’s & Templates,

Datastore,

And Networking,

Let’s look at the NSX env,

All management hosts have been prepared for nsx,

Host configuration on one of the host in this cluster,
“vcf-vds01” configured for NSX. TZ, Uplink profile & IP pool created and configured already.

vCenter Virtual switch view on one of the host,

NSX already have backup configured, And the last backup was successful.

If you look at the backup config, it has configured sddc as a backup server,

Lets have a look at the SDDC manager dashboard,

Host view on SDDC shows as expected,

Workload Domain view shows our management domain,

Click on the management domain name to check details,

Host tab on under the management domain shows host details again,

Edge clusters are empty. You get an option to deploy edge clusters for mgmnt domain. I will be writing separate blog on it,

Password management options allows you to create / edit passwords for all SDDC components at one place. You can also schedule password rotation for all components.

As discussed in the first blog of this series, here is the option to subscribe to licenses,

Like other products of VMware, you get an option to integrate AD,

Option to deploy vRealize Suite from SDDC,

Well, that’s all for this post. Keep following for upcoming blogs on VCF 5.X.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step-Phase3 – Deployment

Welcome back. We are done with all pre-reqs and deployment parameter sheet in earlier posts. If you missed it, you can find it here…

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

Login to Cloud Builder VM and start the deployment process.

Select “vCloud Foundation” here,

The other option “Dell EMC VxRail” to be used when your physical hardware vendor is Dell.

VxRail is hyper-converged appliance. It’s a single device which includes compute, storage, networking and virtualization resources. It comes with pre-configured vCenter and esxi servers. Then there is a manual process to convert this embedded vCenter into user manage vCenter, and that’s when we use this option.

Read all prereqs on this page and make sure to fulfill them before you proceed.

Scroll down to check remaining prereqs,

Click next here.

Earlier versions of VCF gave an option to download the “Deployment Parameter” excel sheet on this page.

You must download this sheet from the same place where you downloaded the vcf ova from.

Its time to start the actual deployment. We will resolve the issues as we move on.
Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.

Upload the file and Next.  
CB validates everything that is required for the complete deployment in this step.

To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.

Let’s continue with the deployment…

“Error connecting to ESXi host. SSL Certificate common name doesn’t match ESXi FQDN”

Look at the “vcf-bringup.log” file.

This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates

You can see here, Even if the hostname on the top shows “vcf157.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.

SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.

esxcli system hostname set -H=vcf157
esxcli system hostname set -f= vcf157.virtualrove.local
cd /etc/vmware/ssl
/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart
Reboot

You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.

Verify the hostname in the cert once server boots up.

Next, Hit retry on cloud builder, and we should be good.

Next, warning for vSAN Disk Availability
Validate ESXi host has at least one valid boot disk.

Not sure about this one. Double checked and confirm that all disks are available on the esxi host. I will simply ignore this.

Next, warnings for NTP.
Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local
NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds

For ESXi, Restart of ntpd.service resolved issue.
For CB, I had to sync the time manually.

Steps to manually sync NTP…
ntpq -p
systemctl stop ntpd.service
ntpdate 172.16.31.110
Wait for a min and again run this
ntpdate 172.16.31.110
systemctl start ntpd.service
systemctl restart ntpd.service
ntpq -p

verify the offset again. It must be closer to 0.
Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊

This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing.

Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.

Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.

Next, You will get this window once you resolve all errors.
Click on “Deploy SDDC”.

Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.

So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊

Let the game begin…

Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.

Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.

Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.

It started the vCenter deployment on 1st VSAN enabled host.

You can also login to 1st esxi and check the progress of vCenter deployment.

vCenter installation finished. Moved to NSX deployment.

Failed at NSX deployment stage,

Failed to join NSX managers to form a management cluster. Failed to detach NSX managers from the NSX management cluster.

I logged into the all 3 NSX managers and found that one of the NSX manager were showing Management UI: DOWN on the console. Restarted the affected NSX manager and it was all good.

Retry on the CB did not show that error again.
And finally, it finished all tasks.

Click Finish. And it launches another box.

That was fun. We have successfully deployed vCloud Foundation version 5.0

There are multiple tests that can be performed to check if the deployed environment is redundant at every level. Time to verify and do some post deployment checks. I will cover that in next post.

Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.

For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.

Hope you enjoyed reading the post. It’s time for you to get started and deploy VCF. Feel free to comment below if you face any issues.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet

We have prepared the environment for VCF deployment. Its time to discuss the “Deployment Parameters” excel sheet in detail. Following are lists of blogs in this series.

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

“Introduction” sheet from the deployment parameter.

Go through this carefully and make sure that you have everything in place that is needed for the deployment. NO edits on this sheet.

Next, “Credentials” sheet.

Check the password policy and make sure to generate passwords accordingly. It fails at validation if its not meet.

Any unacceptable values cell turns to RED in this entire sheet.

Moving on to next sheet “Hosts and Networks”.

Couple of things to discuss here,

Management Domain Networks – All networks should be pre-created on the TOR.

Here is the screenshot from my TOR.

Management Domain ESXi Hosts – All IP’s to be reserved and DNS records in place.

Moving onto “vSphere Distributed Switch Profile” in this sheet. It has 3 profiles. Let’s talk about available options.

Profile-1

This profile will deploy a single vDS with 2 or 4 uplinks. All network traffic will flow through the assigned nics in this vDS.

Profile-2

If you want to split the VSAN traffic on dedicated pnics, choose this option.

This one deploys 2 VDS. You can see that the first vDS will carry management, vMotion, Host Overlay traffic and the other one is for VSAN. Each vDS can have up to 2 pnics.

Profile-3

This one also deploys 2 vDS, just that the VSAN traffic is merged into 1stvds and 2nd vds only carries host overlay traffic.

Select the profile as per your business requirement and move to next step. For this lab, I have selected the 1st profile.

Moving to the “NSX Host Overlay Network” – You have an option to enable DHCP on 1634 vlan or define values manually.

Next – “Deploy Parameters” sheet,

Define all parameters here carefully. Again, If something is not good, the cell would turn RED.

As discussed in 1st blog in this series, VCF has now introduced subscription-based licensing. If you select “NO”, then you have to manually enter license keys here. If yes, a note appears in RED,

Just found out that the vmware kb’s are redirecting to Broadcom already. 😊

Check this Broadcom kb for more information,

https://knowledge.broadcom.com/external/article?legacyId=89567

“During bring-up, in subscription licensing mode, the management domain is deployed in evaluation mode. It is expected that you complete the subscription process for VMware Cloud Foundation+ within 60 days. After the period has expired, you cannot do any actions related the workload domains, such as add or expand workload domain, add or remove cluster, add or remove host”

One caveat here, if you deploy the stack in subscription-based model, the SDDC manager does not allow perform any additional operations until you finish the subscription process. In short, it is of no use until you finish the subscription.

Let me show you,

This screenshot was captured when I deployed it subscription model.
This is what you see when you deploy it in subscription model and do not activate it,

All additional config options will be grayed out. You see a msg there “Deactivated in Subscription-Unsubscribed mode.”

Any changes to “Workload Domain” will be blocked.

No adding hosts to mgmnt domain,

Back to Deploy Parameter, So, make your choices wisely and plan it accordingly.
Moving to “vSphere Infra” section in the deployment parameters sheet.

And finally, the NSX & SDDC section,

We are all set to upload this “Deployment Parameter” sheet to Cloud Builder and begin the deployment. That is all for this blog. We will perform the actual deployment in next blog.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step-Phase1 – Preparation

Got the VCF 5.X env stood up after few attempts. It was fun and good learning too.

Planning / Design phase plays an important role in VCF deployment. I would say, deployment is just a day task, however, planning goes on for weeks. I would specifically like to emphasize on ‘Licensing’. VCF can be deployed in either subscriptions based licensing model or perpetual. I will discuss about this in later blogs in this series.

Imp Note: You cannot return to using a perpetual license without doing a full bring-up rebuild.

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.0/vcf-admin/GUID-973601B5-9CDD-40C2-A7C4-FF117C1820DD.html

License calculator is available for download in following KB.

https://kb.vmware.com/s/article/96426

This series of VCF 5.X includes following parts,

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

Let’s get into “Preparation” phase and start preparing the infrastructure for VCF deployment.

The deployment of VMware Cloud Foundation is automated. We use VMware Cloud Builder initially to deploy all management domain components. The following components / options have been removed from 5.X initial deployment, compared to previous versions.

Application Virtual Networks (AVN’s)
Edge Deployment
Creation of Tier-1 & Tier-0
BGP peering

All of it can only be configured via SDDC manager after successful deployment. Hence, it has become little easy when it comes to the deployment.
Due to the multiple attempts of deployment, I am able to jot down the high-level deployment flow here, which is automated and performed by the Cloud Builder once you start the deployment.

After the validation, CB performs the following step to configure the VCF env.

Connect to 1st target ESXi host and configure single host VSAN datastore.
Start the vCenter deployment on 1st VSAN enabled host.
After successful deployment of vCenter, Create Datacenter object, Cluster and adds remaining 3 hosts in the cluster.
Configure all vmk’s on all 4 hosts.
Create VDS and add all 4 hosts to VDS.
Configure disk group to form a VSAN datastore on all hosts.
Deploy 3 NSX managers on management port group and Configure a VIP.
Add Compute Manager (vCenter) and create required transport zones, uplink profiles & network pools.
Configure vSphere cluster for NSX (VIBs installation)
Deploy SDDC manager.
And some post deployments tasks for cleanup.
Finish.

And this is what you would expect after the successful deployment.  😊

Believe me, it’s going take multiple attempts if you are doing it for the first time.

Let’s have a look at the Bill of Materials (BOM) for Cloud Foundation version 5.0.0.0 Build 21822418.

Software ComponentVersion
Cloud Builder VM5.0-21822418
SDDC Manager5.0-21822418
VMware vCenter Server Appliance8.0 U1a -21815093
VMware ESXi8.0 U1a -21813344
VMware NSX-T Data Center4.1.0.2.0-21761691
Aria Suite Lifecycle8.10 Patch 1 -21331275

It’s always a good idea to check release notes of the product before you design & deploy. You can find the release notes here.

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.0/rn/vmware-cloud-foundation-50-release-notes/index.html

Some of the content of this blog has been copied from my previous blog (VMware vCloud Foundation 4.2.1 Step by Step) since it matches with version 5.0 too.

Let’s discuss and understand the high level installation flow,

Configure TOR for the networks that are being used by VCF. In our case, we have VyOS router.
Deploy a Cloud Builder VM on standalone source physical ESXi.
Install and Configure 4 ESXi Servers as per the pre-requisites.
Fill in the “Deployment Parameters” excel sheet carefully.
Upload “Deployment Parameter” excel sheet to Cloud Builder.
Resolve the issues / warning shown on the validation page of CB.
Start the deployment.
Post deployment, you will have a vCenter, 4 ESXi servers, 3 NSX managers & SDDC manager deployed.
Additionally, you can deploy VI workload domain using SDDC manager. This will allow you to deploy Kubernetes cluster and vRealize Suite components.

You definitely need huge amount of compute resources to deploy this solution.

This entire solution was installed on a single physical ESXi server. Following is the configuration of the server.

HP ProLiant DL360 Gen9
2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
512 GB Memory
4 TB SSD

Am sure it is possible in 256 gigs of memory too.

Let’s prepare the infra for VCF lab.

I will call my physical esxi server as a base esxi in this blog.
So, here is my base esxi and VM’s installed on it.

VyOS – This virtual router will act as a TOR for VCF env.
dc.virtaulrove.local – This is a Domain Controller & DNS Server in the env.
jumpbox.virtaulrove.local – To connect to the env.
vcf173 to vcf176 – These will be the target ESXi’s for our VCF deployment.
cb.virtaulrove.local – Cloud Builder VM to deploy VCF.

Here is a look at the TOR and interfaces configured…

Follow my blog here to configure the VyOS TOR.

Network Requirements: Management domain networks to be in place on physical switch (TOR). Jumbo frames (MTU 9000) are recommended on all VLANs or minimum of 1600 MTU.

Following DNS records to be in place before we start with the installation.

Cloud Builder Deployment:

Cloud Builder is an appliance provided by VMware to build VCF env on target ESXi’s. It is a one time use VM and can be powered off after the successful deployment of VCF management domain. After the deployment, we will use SDDC manager for managing additional VI domains. I will be deploying this appliance in VLAN 1631, so that it gets access to DC and all our target ESXi servers.

Download the correct CB ova from the downloads,

We also need excel sheet to downloaded from the same page.

‘Cloud Builder Deployment Parameter Guide’

This is a deployment parameter sheet used by CB to deploy VCF infrastructure.

Deployment is straight forward like any other ova deployment. Make sure to you choose right password while deploying the ova. The admin & root password must be a minimum of 8 characters and include at least one uppercase, one lowercase, one digit, and one special character. If this does not meet, then the deployment will fail which results in re-deploying ova.

Nested ESXi Installation & Prereqs
With all these things in place, our next step is to deploy 4 nested ESXi servers on our physical ESXi host. These will be our target hosts for VCF deployment. Download the correct supported esxi version ISO from VMware downloads.

All ESXi should have an identical configuration. I have following configuration in my lab.

vCPU: 12
2 Sockets, 6 cores each.
CPU hot plug: Enabled
Hardware Virtualization: Enabled
HDD1: Thick: ESXi OS installation
HDD2: Thin VSAN Cache Tier
HDD3: Thin VSAN Capacity Tier
HDD4: Thin VSAN Capacity Tier

And 2 network cards attached to Trunk_4095. This will allow an esxi to communicate with all networks on the TOR.

Map the ISO to CD drive and start the installation.

I am not going to show ESXi installation steps, since it is available online in multiple blogs. Let’s look at the custom settings after the installation.

DCUI VLAN settings should be set to 1631.

IPv4 Config

DNS Config

And finally, make sure that the ‘Test Management Network’ on DCUI shows OK for all tests.

Repeat this for all 4 nested esxi.

I have all my 4 target esxi severs ready. Let’s look at the ESXi configuration that has to be in place before we can utilize them for VCF deployment.

All ESXi must have ‘VM network’ and ‘Management network’ VLAN id 1631 configured.
NTP server address configured on all ESXi.

SSH & NTP service to be enabled and policy set to ‘Start & Stop with the host’

All additional disks to be present on an ESXi as a SSD and ready for VSAN configuration. You can check it here.

If your base ESXi has HDD and not SSD, then you can use following command to mark those HDD to SSD.

You can either connect to DC and putty to ESXi or open ESXi console and run these commands.

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T1:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T2:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T3:L0 -o enable_ssd
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T1:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T2:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T3:L0

Once done, run ‘esxcli storage core device list’ command and verify if you see SSD instead of HDD.

Well, that should complete all our pre-requisites for target esxi’s.

Till now, we have completed configuration of Domain controller, VyoS router, 4 nested target ESXi & Cloud Builder ova deployment. Following VM’s have been created on my physical ESXi host.

I will see you in the next post. Will discuss about “Deployment Parameters” excel sheet in detail.
Hope that the information in the blog is helpful. Thank you.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

NSX-T Upgrade from v3.2.2 to v4.1 Failed – Connection between host 473cc672-2417-4a97-b440-38ab53135d02 and NSX Controller is UNKNOWN.

Got following error while upgrading NSX from v3.2.2 to v4.1.

Pre-upgrade checks failed for HOST: Connection between host 473cc672-2417-4a97-b440-38ab53135d02 and NSX Controller is UNKNOWN. Response : [Lcom.vmware.nsxapi.fabricnode.dto.ControlConnStatusDto;@edbaf5b Connection between host 473cc672-2417-4a97-b440-38ab53135d02 and NSX Manager is UNKNOWN. Please restore connection before continuing. Response : Client has not responded to heartbeats yet

We only have 3 hosts in the cluster. For some reason, it was showing 4th host “esxi164” in host groups which does not exist in the vCenter inventory.

Click on the host group to check the details.

Here is my vCenter inventory,

The host in the question (esxi164.virtualrove.local) was one of the old host in the cluster. It was removed from the cluster long back. However, somehow it is showing up in NSX upgrade inventory.

And as the error message says, NSX-T manager was unable to locate this to upgrade it.

“Connection between host 473cc672-2417-4a97-b440-38ab53135d02 and NSX Manager is UNKNOWN.”

The UUID mentioned in the error message had to be for missing host (esxi164.virtualrove.local). Because the UUID was not matching with any of the host transport nodes UUID in the cluster. You can run the following command on one of the NSX manager to get the UUID’s of the nodes.

get transport-nodes status

Or you can click on the TN node in NSX UI to check the UUID.

If you click next on the upgrade page, it will not let you upgrade NSX managers.

So, the possible cause for this issue is, the old host entry still exists in the NSX inventory somewhere. And it is trying to locate that host to upgrade it.

There is an API call to check the state of the host.
GET https://{{MPIP}}/api/v1/transport-nodes/<Transport-Node-UUID>/state

Replace the MPIP (NSX manager IP) and TN UUID to match with your env.
GET https://172.16.31.168/api/v1/transport-nodes/473cc672-2417-4a97-b440-38ab53135d02/state

As we can see from the output, “current_step_title: Preparing Installation”. Looks like something went wrong while the host was being removed from NSX env and its state is still being marked as “state: pending” in NSX manager database.

Lets delete the host entry by using an API call,
DELETE https://172.16.31.168/api/v1/transport-nodes/473cc672-2417-4a97-b440-38ab53135d02?force=true&unprepare_host=false

Status: 200 OK

Run the GET API again to confirm,

It does not show any information now.

Time to check the upgrade console in NSX.

The group which was showing 1 host with an error no longer exists.

I was able to get to the next step to upgrade NSX managers.

Confirm and start.

Upgrade status.

As stated in the message above, ran “get upgrade progress-status” in the cli.

NSX upgrade to v4.1 has been successfully completed.

That’s all for this blog. Hope that the information in the blog is helpful. See you in the next blogpost. Thank You for visiting.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

NSX-T: Replace faulty NSX Edge Transport Node VM

I recently came across a situation where the NSX-T Edge vm in an existing cluster was having issues while loading its parameter. Routing was working fine and there was no outage as such. However, when a customer was trying to select an edge vm and edit it in NSX UI, it was showing an error. Support from VMware said that the edge in question is faulty and needs to be replaced. Again, routing was working perfectly fine.

Let’s get started to replace the faulty edge in the production environment.

Note: If the NSX Edge node to be replaced is not running, the new NSX Edge node can have the same management IP address and TEP IP address. If the NSX Edge node to be replaced is running, the new NSX Edge node must have a different management IP address and TEP IP address.

In my lab env, we will replace a running edge. Here is my existing NSX-T env…

Single NSX-T appliance,

All hosts TN have been configured,

Single edge vm (edge 131) attached to edge cluster,

One test workload overlay network. Segment Web-001 (192.168.10.0/24)

A Tier-0 gateway,

Note that the interfaces are attached to existing edge vm.

BGP config,

Lastly, my VyOS router showing all NSX BGP routes,

Start continuous ping to NSX test overlay network,

Alright, that is my existing env for this demo.

We need one more thing before we start the new edge deployment. The new edge vm parameters should match with the existing edge parameters to be able to replace it. And the existing edge showing an error when we try to open its parameters in NSX UI. The workaround here is to make an API call to existing edge vm and get the configuration.

Please follow the below link to know more about API call.

NSX-T: Edge Transport Node API call

I have copied the output to following txt file,

EdgeApi.txt

Let’s get started to configure the new edge to replace it with existing edge. Here is the link to the blogpost to deploy a standalone edge transport node.

NSX-T: Standalone Edge VM Transport Node deployment

New edge vm (edge132) is deployed and visible in NSX-T UI,

Note that the newly deployed edge (edge132) does not have TEP IP and Edge cluster associated with it. As I mentioned earlier, The new edge vm parameters should match with the existing edge parameters to be able to replace it.

Use the information collected in API call for faulty edge vm and configure the new edge vm the way you see it in the API call. Here is my new edge vm configuration looks like,

Make sure that the networks matches with the existing non working edge networks.

You should see TEP ip’s once you configure the new edge.

Click on each edge node and verify the information. All parameters should match.

Edge131

Edge132

We are all set to replace the faulty edge now.

Select the faulty edge (edge131) and click on actions,

Select “Enter NSX Maintenance Mode”

You should see Configuration State as “NSX Maintenance Mode” in the UI.

And you will lose connectivity to your NSX workload.

No BGP route on the TOR

Next, click on “Edge Clusters”, Select the edge cluster and “Action”.

Choose “Replace Edge Cluster Member”

Select appropriate edge vm’s in the wizard and Save,

As soon as the faulty edge have been replaced, you should get the connectivity to workload.

BGP route is back on the TOR.

Interface configuration on the Tier-0 shows new edge node.

Node status for faulty edge shows down,

Let’s get into the newly added edge vm and run “get logical-router” cmd,

All service routers and distributed routers have been moved to new edge.

Get into the SR and check routes to make sure that it shows all connected routes too,

We are good to delete the old edge vm.

Lets go back to edge transport node and select the faulty edge and “DELETE”

“Delete in progress”

And its gone.

It should disappear from vCenter too,

Well, that was fun.

That’s all I had to share from my recent experience. There might be several other reasons to replace / delete existing edge vm’s. This process should apply to all those use cases. Thank you for visiting. See you in the next post soon.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

NSX-T: Edge Transport Node API call

Welcome back techies. This is going to be the short one. This article describes the steps to make an API call to NSX edge transport node vm to get the edge configuration. At the time of writing this blog, I had to collect this information to replace the faulty edge node vm in the env.

Here is the API call,
GET https://<nsx-manager-IP>/api/v1/transport-nodes/tn-id .

Replace nsx manager ip and tn-id with your edge vm id in nsx env.

https://172.16.31.129/api/v1/transport-nodes/e55b9c84-7449-477a-be42-d20d6037c089

To get the “tn-id” for existing faulty edge, login to NSX > System> Nodes and select existing faulty edge

You should see ID on the right side,

If NSX UI is not available for any reason, ssh to nsx-t manager using admin credentials and run following command to capture the UUID / Node ID

get nodes

This is how the API call and output looks like,

Send the API call to get the output shown in the body. This output contains entire configuration of the edge vm in script format.

That’s all for this post. Thank You.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

NSX-T: Standalone Edge VM Transport Node deployment

There can be multiple reasons to deploy nsx-t edge vm via ova instead of deploying thought nsx-t manager. At the time of writing this blog, I had to deploy edge vm via ova to replace faulty edge vm in nsx-t env. You may be deploying one to create a Layer 2 Bridge between nsx-v and nsx-t env to migrate workload.

Alright. Let’s start deploying an edge vm without using nsx-t manager UI.

To begin with, you need to manually download edge vm ova from VMware downloads page here…

Make sure to match the version with your existing NSX-T env.

Once downloaded, login to vSphere web client and start deploying an ova template. It’s straightforward like any other generic ova deployment. Make sure to select the exact same networks that are attached to your existing faulty edge vm.

In my case, 1st vmnic is attached to management network and next 2 are attached to uplink1 & uplink2 network respectively. Rest all nic cards remains unchecked.

Next, you will need to enter the NSX-T manager information in “Customize Template” section of the deployment.

Enter the Manager IP & Credentials.
NO need to enter “Node ID”.

You also have an option to leave this blank and join it to NSX-T control plane once the appliance is up and running. For now, I am entering all these details. Will also discuss on manually attaching edge vm to nsx-t manager control plane.

To get the NSX-T manager thumbprint. SSH to NSX-T manager and run following command,

get certificate api thumbprint

You can also get the thumbprint from the following location in the UI.

Login to NSX-T manager and click on view details,

Enter remaining network properties in the deployment wizard and finish.

Once the VM is up and running, you will see it in NSX-T UI here,

You will not see newly added edge vm here If you did not enter NSX-T thumbprint information in the deployment wizard. To manually join newly created edge vm to nsx-t manger control plane, run following command on the newly created edge vm.

Edge> join management-plane <Manager-IP> thumbprint <Manager-thumbprint> username admin

Same process has been described in following VMware article.

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/migration/GUID-8CC9049F-F2D3-4558-8636-1211A251DB4E.html

Next, the newly created edge vm will not have N-VDS, TEP or Tranport Zones configuration. Further configuration will be specific to individual use case.

That’s all for this post.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VMware Tanzu Supervisor Cluster Deployment Stuck at “Configuring”

Thought of sharing this with you all.

Tanzu Supervisor Cluster Deployment fails and shows “Configuring” for all 3 “SupervisorControlPlaneVM” VM’s.

If you click on (3), it shows either of the below warnings / errors.

“Customization operations of the guest OS for Master node VM with identifier vm-XXXX is pending”

In my case, it showed up for a few mins and then I saw an error.

“The control plane VM 42XXXX was unable to authenticate to the load balancer (Avi – https://172.16.31.123:443/api/cluster) with the username ‘admin’ and the supplied password. Validate the Supervisor cluster load balancer’s authentication configuration.”

Even though the supplied credentials were correct.

Looks like this is a known issue if the version of your esxi is 7.0 U3 and you are trying to use Advance Loan Balancer (AVI).

To resolve this issue, I had to change the Authentication settings in AVI.

Login to AVI and Navigate to Admin > Settings & Access Settings

Click on “Edit”

Check the box “Allow Basic Authentication”

Click “Save” and you should be good.

The “Config Status” changes to Running in couple of minutes and you should be good configure it further.

Some of the other workarounds that came across while troubleshooting this issue…

  • Nslookup to all the components in the env to make sure that it resolves to correct name.
  • Check NTP settings on all components (vCenter, ESXi, AVI and NSX) and make sure it syncs to same NTP Server.
  • Check routing between all the additional networks that you have created for Tanzu deployment.

Additionally, you can use the following command on vCenter to check the status / error of the deployment.

tail -f /var/log/vmware/wcp/wcpsvc.log

Changing authentication settings in AVI resolved the issue for me. Your issue may be related to one of the causes that I mentioned above.

Good Luck. Keep Sharing.
That’s all for this blogpost.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

NSX 4.0 Series Part5-Migrate workload from VDS To NSX

Welcome back readers.

Please find the links below for all posts in this series.

NSX 4.0 Series Part1-NSX Manager Installation
NSX 4.0 Series Part2-Add a Compute Manager & Configure the NSX VIP
NSX 4.0 Series Part3-Create Transport Zones & Uplink Profiles
NSX 4.0 Series Part4-Prepare Host Transport Nodes
NSX 4.0 Series Part5-Migrate workload from VDS To NSX

Our NSX env is fully functional and we are ready to migrate workload from vCenter VDS to NSX env.

It’s always a good practice to verify the NSX env before we start working on it.

Login to NSX VIP and look for Alarms,

Check the cluster status,

And then look for host transport nodes if they are showing host status as UP,

For testing purposes, I have created 3 windows vm’s. All three vm’s connects to 3 different port groups on vCenter VDS. We will move these VM’s from vCenter VDS to NSX managed segments.

Following are test VM’s with their respective vds port groups. I have named these VM’s according to PG.

Next, we need to create Segments in NSX env. A Segment is nothing but the portgroup.

Let’s have a look at the types of Segments.

VLAN Baked Segments: In this type, you will define a VLAN ID for the segments, however you also have to make sure that this vlan configure exists on your physical top of the rack switch.

Overlay Backed Segments: This segment can be configured without any configuration on the physical infrastructure. It gets attached to Overlay Transport Zone and traffic is carried by a tunnel between the hosts.

As stated earlier, we would be only focusing on VLAN backed segments in this blogpost. Visit the following blog if you are looking for overlay backed segment.

Login to NSX and navigate to Networking> Segments,

Oops, I haven’t added license yet. If you do not have a license key, please refer to my following blog to get the eval licenses.

Add the license key here,

System> Licenses,

Then we move to create a VLAN backed segment in NSX. You can create vlan backed segments for all networks that exist on your TOR (top of the rack switches). For this demo, I will be using Management-1631, vMotion-1632 and VSAN-1633 networks.

In my lab env, following networks are pre-created on the TOR.

Login to NSX VIP> Networking> Segments> Add Segment

Name: VR-Prod-Mgmnt-1631
Transport Zone: VirtualRove-VLAN-TZ (This is where our esxi host transport nodes are connected)
VLAN: 1631

SAVE

Verify that the Segment status is Success.

Once the segment is created in NSX, go back to vCenter and verify if you see the newly created segment. You will see a letter “N” for all NSX create segments.

Click on the newly created Segment.

Note that the Summary section shows more information about the segment.

We will now move a VM called “app-172.16.31.185” from VDS to NSX.

Source VDS portgroup is “vDS-Management-1631”
Destination NSX Segment is “VR-Prod-Mgmnt-1631”

Verify that it is connected to VDS portgroup.

Login to the VM and start a ping to its gateway IP.

Login to the vCenter> Networking view> Right Click the source port group>

And select “Migrate VM’s to another network”.

In the migration wizard, select newly created NSX vlan backed segment in destination network,

Select the VM that needs to be migrated into the NSX env,

Review and Finish,

Monitor the ping command if we see any drops.

All looks good. NO ping drops and I can still ping to the vm ip from other machines in the network.

We have successfully migrated a VM into the NSX env.
Verify the network name in VM settings,

Click on the NSX segment in vCenter and verify if you see the VM,

You can also verify the same from NSX side,
Login to NSX> Inventory> Virtual Machines> Click on View Details for the VM that we just migrated,

You will see port information in details section,

You will not see port information for db vm, since it has not been migrated yet.

Remaining VM’s have been moved into the NSX env. Ports column shows “1” for all segments.

We see all 3 NSX segments in vCenter networking view,

Simple ping test in cross subnets.  From App To DB,

Well, all looks good. Our workload has been successfully migrated into NSX env.

So, what is the use case here…?
Why would customer only configure vlan backed segments…?
Why No overlay…?
Why No T1, T0 and Edge…?

You will surely understand this in my next blog. Stay tuned. 😊
Hope that this blog series has valuable information.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.