VMware VCF 5.2 – Onboarding existing vSphere infra into VCF

With vCloud Foundation 5.2 version, you can import / convert your existing vSphere infrastructure as well as NSX env into the VCF and start managing it from SDDC manager.

VCF import tool will help to achieve the same. The download link can be found here,

Download

Let’s discuss some prerequisites before we move on,

Existing vSphere versions should be aligned with VCF 5.2 BOM. (ESXi & vCenter)
No standard switch supported.
DRS should be in fully automated mode and HA enabled on the cluster.
Common / Shared storage between the hosts. (VSAN skyline health check compatible or with NFS3 version or FC storage)
VDS should have physical uplinks configured.
vCenter & SDDC Manager should be installed in the same cluster that we are converting. SDDC manager can be downloaded from the same location as VCF import tool.
Cluster with vCenter should have minimum of 4 hosts.

Here is the VCF 5.2 BOM for your reference,

Let’s get into lab and review the existing env,

I have a 4 hosts VSAN cluster with vCenter and SDDC installed in it. Remember, VSAN is not a requirement for this setup.

Review the ESXi & vCenter version,

SDDC manager installation is pretty simple and straightforward. You just need to download the ova and import it into the cluster. It will ask for some common parameters like hostname, ip schema & passwords for defined users. Once the SDDC boots up, you will see this on the screen of SDDC url,

“VMware Cloud Foundation is initializing… ”

Additionally, I have one single vds with 2 uplinks and some common port groups configured,

Let’s get into action and start working on getting this env into VCF.

WinSCP to SDDC appliance using VCF user and upload the VCF Import tool to vcf directory on SDDC,

Next, SSH to SDDC appliance,

Check the if the file has been uploaded and then extract the tar file using ‘ tar xvf’ command,

You will see multiple files getting extracted in the same directory,

Next, get into the ‘vcf-brownfield-toolset’ directory to run further commands,

The python script (vcf_brownfield.py) is located under “vcf-brownfield-toolset” directory.

We need to runt this script using some additional parameters,

python3 vcf_brownfield.py convert –vcenter {vcenter-fqdn} –sso-user administrator@vsphere.local –domain-name {WLD-Name} –nsx-deployment-spec-path {NSx-Specifications-JOSN-Path} OR –skip-nsx-deployment

Replace the parameters to match with your env. And check the env compatibility by changing the parameter to “check” instead of “convert”.

“CHECK” – Checks whether a vCenter is suitable to be imported into SDDC Manager as a workload domain.

python3 vcf_brownfield.py check –vcenter vcfvc182.virtualrove.local –sso-user administrator@vsphere.local –domain-name imported-vcf –skip-nsx-deployment

All other supported parameters can be found here,

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.2/vcf-admin/GUID-44CBCB85-C001-41B2-BBB4-E71928B8D955.html

I am skipping the nsx part of it by adding “–skip-nsx-deployment” towards the end of the command.

Next, script needs vCenter SSO credentials

Confirm thumbprint,

Monitor the script,

After finishing all required checks, it loads the summary,

As we can see, 2 checks have been failed out of 89. It has also generated CSV file on SDDC manager. Lets have look at the logs and fix things as needed,

On SDDC manager, navigate to Output folder and download the CSV file,

The downloaded CSV file is easy to read, It shows the issue description and remediate it,

here is the sample…

Apply the filter for ‘Status’ column to see ‘VALIDATION_FAILED’.

As we see, it is complaining about HA / DRS being disabled on the cluster. I did that on purpose.
And the next one is ‘WARNING’ and not an ‘ERROR’. However, found following article which talks about that warning,

“ESXi upgrade policy validation across vCenter and SDDC Manager”  

ESX Upgrade Policy Guardrail Failure

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.2/vcf-admin/GUID-458B6715-3ED6-4423-B093-64B1A2963CC0.html

This issue causes a warning only. You can proceed with the convert or import process without remediating the issue.

Hence, fixed the HA and ran the ‘CONVERT’ command,

Imp Note: Remember to take a snapshot of SDDC Manager before you run the CONVERT command.

Next, we can see that one check is failed and it still gave an option to move on,

Next, we get an option to select primary storage for VCF env. I had iSCSI datastore too,

Final confirmation and the game begins,

Monitor the green parent tasks,

Populating inventory in SDDC,

Navigating to SDDC Manager URL,

You will also notice a task in progress on SDDC Manager, ‘Initialize Imported Domain’

Click on the task to check details. It will help you to troubleshoot if something breaks,

Lets check the vCenter once all tasks have been completed. Notice that the vCenter inventory has a new resource pool.

Script shows completed on SDDC manager ssh session.

Review the SDDC UI,


“imported-vcf” that’s the name of the domain we had specified in the script,

Hosts,

We have successfully converted our existing vSphere env into the VCF.

Next, You can plan on getting NSX configured in the management domain and then additional VI Workload Domains.

Before we finish, wanted to share some more reports (guardrails_report_vcf) from failed VCF validation.

Well, that’s all I had for this post. Stay tuned for next couple of blogs on VCF.

Detailed information on available operation in python script can be found here,

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.2/vcf-admin/GUID-6EEE731E-C3C4-40AD-A45D-5BAD2C4774AB.html

When you are using the VCF Import Tool, certain guardrail messages may require you to perform a manual update to the SDDC Manager database. If you encounter any of these guardrail issues, modify the example commands to resolve the issues.

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.2/vcf-admin/GUID-BE0B38BC-8A4C-470E-B676-1B85D3C840F0.html

Thank You.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

We have covered the entire VCF 5.X stack deployment in my earlier 3 blogs.

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

It’s time to check VCF environment and do some post checks.
Here is the SDDC manager after the deployment,

Host & Clusters view,

VM’s & Templates,

Datastore,

And Networking,

Let’s look at the NSX env,

All management hosts have been prepared for nsx,

Host configuration on one of the host in this cluster,
“vcf-vds01” configured for NSX. TZ, Uplink profile & IP pool created and configured already.

vCenter Virtual switch view on one of the host,

NSX already have backup configured, And the last backup was successful.

If you look at the backup config, it has configured sddc as a backup server,

Lets have a look at the SDDC manager dashboard,

Host view on SDDC shows as expected,

Workload Domain view shows our management domain,

Click on the management domain name to check details,

Host tab on under the management domain shows host details again,

Edge clusters are empty. You get an option to deploy edge clusters for mgmnt domain. I will be writing separate blog on it,

Password management options allows you to create / edit passwords for all SDDC components at one place. You can also schedule password rotation for all components.

As discussed in the first blog of this series, here is the option to subscribe to licenses,

Like other products of VMware, you get an option to integrate AD,

Option to deploy vRealize Suite from SDDC,

Well, that’s all for this post. Keep following for upcoming blogs on VCF 5.X.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step-Phase3 – Deployment

Welcome back. We are done with all pre-reqs and deployment parameter sheet in earlier posts. If you missed it, you can find it here…

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

Login to Cloud Builder VM and start the deployment process.

Select “vCloud Foundation” here,

The other option “Dell EMC VxRail” to be used when your physical hardware vendor is Dell.

VxRail is hyper-converged appliance. It’s a single device which includes compute, storage, networking and virtualization resources. It comes with pre-configured vCenter and esxi servers. Then there is a manual process to convert this embedded vCenter into user manage vCenter, and that’s when we use this option.

Read all prereqs on this page and make sure to fulfill them before you proceed.

Scroll down to check remaining prereqs,

Click next here.

Earlier versions of VCF gave an option to download the “Deployment Parameter” excel sheet on this page.

You must download this sheet from the same place where you downloaded the vcf ova from.

Its time to start the actual deployment. We will resolve the issues as we move on.
Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.

Upload the file and Next.  
CB validates everything that is required for the complete deployment in this step.

To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.

Let’s continue with the deployment…

“Error connecting to ESXi host. SSL Certificate common name doesn’t match ESXi FQDN”

Look at the “vcf-bringup.log” file.

This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates

You can see here, Even if the hostname on the top shows “vcf157.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.

SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.

esxcli system hostname set -H=vcf157
esxcli system hostname set -f= vcf157.virtualrove.local
cd /etc/vmware/ssl
/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart
Reboot

You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.

Verify the hostname in the cert once server boots up.

Next, Hit retry on cloud builder, and we should be good.

Next, warning for vSAN Disk Availability
Validate ESXi host has at least one valid boot disk.

Not sure about this one. Double checked and confirm that all disks are available on the esxi host. I will simply ignore this.

Next, warnings for NTP.
Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local
NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds

For ESXi, Restart of ntpd.service resolved issue.
For CB, I had to sync the time manually.

Steps to manually sync NTP…
ntpq -p
systemctl stop ntpd.service
ntpdate 172.16.31.110
Wait for a min and again run this
ntpdate 172.16.31.110
systemctl start ntpd.service
systemctl restart ntpd.service
ntpq -p

verify the offset again. It must be closer to 0.
Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊

This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing.

Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.

Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.

Next, You will get this window once you resolve all errors.
Click on “Deploy SDDC”.

Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.

So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊

Let the game begin…

Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.

Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.

Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.

It started the vCenter deployment on 1st VSAN enabled host.

You can also login to 1st esxi and check the progress of vCenter deployment.

vCenter installation finished. Moved to NSX deployment.

Failed at NSX deployment stage,

Failed to join NSX managers to form a management cluster. Failed to detach NSX managers from the NSX management cluster.

I logged into the all 3 NSX managers and found that one of the NSX manager were showing Management UI: DOWN on the console. Restarted the affected NSX manager and it was all good.

Retry on the CB did not show that error again.
And finally, it finished all tasks.

Click Finish. And it launches another box.

That was fun. We have successfully deployed vCloud Foundation version 5.0

There are multiple tests that can be performed to check if the deployed environment is redundant at every level. Time to verify and do some post deployment checks. I will cover that in next post.

Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.

For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.

Hope you enjoyed reading the post. It’s time for you to get started and deploy VCF. Feel free to comment below if you face any issues.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet

We have prepared the environment for VCF deployment. Its time to discuss the “Deployment Parameters” excel sheet in detail. Following are lists of blogs in this series.

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

“Introduction” sheet from the deployment parameter.

Go through this carefully and make sure that you have everything in place that is needed for the deployment. NO edits on this sheet.

Next, “Credentials” sheet.

Check the password policy and make sure to generate passwords accordingly. It fails at validation if its not meet.

Any unacceptable values cell turns to RED in this entire sheet.

Moving on to next sheet “Hosts and Networks”.

Couple of things to discuss here,

Management Domain Networks – All networks should be pre-created on the TOR.

Here is the screenshot from my TOR.

Management Domain ESXi Hosts – All IP’s to be reserved and DNS records in place.

Moving onto “vSphere Distributed Switch Profile” in this sheet. It has 3 profiles. Let’s talk about available options.

Profile-1

This profile will deploy a single vDS with 2 or 4 uplinks. All network traffic will flow through the assigned nics in this vDS.

Profile-2

If you want to split the VSAN traffic on dedicated pnics, choose this option.

This one deploys 2 VDS. You can see that the first vDS will carry management, vMotion, Host Overlay traffic and the other one is for VSAN. Each vDS can have up to 2 pnics.

Profile-3

This one also deploys 2 vDS, just that the VSAN traffic is merged into 1stvds and 2nd vds only carries host overlay traffic.

Select the profile as per your business requirement and move to next step. For this lab, I have selected the 1st profile.

Moving to the “NSX Host Overlay Network” – You have an option to enable DHCP on 1634 vlan or define values manually.

Next – “Deploy Parameters” sheet,

Define all parameters here carefully. Again, If something is not good, the cell would turn RED.

As discussed in 1st blog in this series, VCF has now introduced subscription-based licensing. If you select “NO”, then you have to manually enter license keys here. If yes, a note appears in RED,

Just found out that the vmware kb’s are redirecting to Broadcom already. 😊

Check this Broadcom kb for more information,

https://knowledge.broadcom.com/external/article?legacyId=89567

“During bring-up, in subscription licensing mode, the management domain is deployed in evaluation mode. It is expected that you complete the subscription process for VMware Cloud Foundation+ within 60 days. After the period has expired, you cannot do any actions related the workload domains, such as add or expand workload domain, add or remove cluster, add or remove host”

One caveat here, if you deploy the stack in subscription-based model, the SDDC manager does not allow perform any additional operations until you finish the subscription process. In short, it is of no use until you finish the subscription.

Let me show you,

This screenshot was captured when I deployed it subscription model.
This is what you see when you deploy it in subscription model and do not activate it,

All additional config options will be grayed out. You see a msg there “Deactivated in Subscription-Unsubscribed mode.”

Any changes to “Workload Domain” will be blocked.

No adding hosts to mgmnt domain,

Back to Deploy Parameter, So, make your choices wisely and plan it accordingly.
Moving to “vSphere Infra” section in the deployment parameters sheet.

And finally, the NSX & SDDC section,

We are all set to upload this “Deployment Parameter” sheet to Cloud Builder and begin the deployment. That is all for this blog. We will perform the actual deployment in next blog.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VCF 5.0 Series-Step by Step-Phase1 – Preparation

Got the VCF 5.X env stood up after few attempts. It was fun and good learning too.

Planning / Design phase plays an important role in VCF deployment. I would say, deployment is just a day task, however, planning goes on for weeks. I would specifically like to emphasize on ‘Licensing’. VCF can be deployed in either subscriptions based licensing model or perpetual. I will discuss about this in later blogs in this series.

Imp Note: You cannot return to using a perpetual license without doing a full bring-up rebuild.

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.0/vcf-admin/GUID-973601B5-9CDD-40C2-A7C4-FF117C1820DD.html

License calculator is available for download in following KB.

https://kb.vmware.com/s/article/96426

This series of VCF 5.X includes following parts,

VCF 5.0 Series-Step by Step-Phase1 – Preparation
VCF 5.0 Series-Step by Step Phase2 – Deployment Parameters Excel sheet
VCF 5.0 Series-Step by Step-Phase3 – Deployment
VCF 5.0 Series-Step by Step-Phase4 – Post Deployment Checks

Let’s get into “Preparation” phase and start preparing the infrastructure for VCF deployment.

The deployment of VMware Cloud Foundation is automated. We use VMware Cloud Builder initially to deploy all management domain components. The following components / options have been removed from 5.X initial deployment, compared to previous versions.

Application Virtual Networks (AVN’s)
Edge Deployment
Creation of Tier-1 & Tier-0
BGP peering

All of it can only be configured via SDDC manager after successful deployment. Hence, it has become little easy when it comes to the deployment.
Due to the multiple attempts of deployment, I am able to jot down the high-level deployment flow here, which is automated and performed by the Cloud Builder once you start the deployment.

After the validation, CB performs the following step to configure the VCF env.

Connect to 1st target ESXi host and configure single host VSAN datastore.
Start the vCenter deployment on 1st VSAN enabled host.
After successful deployment of vCenter, Create Datacenter object, Cluster and adds remaining 3 hosts in the cluster.
Configure all vmk’s on all 4 hosts.
Create VDS and add all 4 hosts to VDS.
Configure disk group to form a VSAN datastore on all hosts.
Deploy 3 NSX managers on management port group and Configure a VIP.
Add Compute Manager (vCenter) and create required transport zones, uplink profiles & network pools.
Configure vSphere cluster for NSX (VIBs installation)
Deploy SDDC manager.
And some post deployments tasks for cleanup.
Finish.

And this is what you would expect after the successful deployment.  😊

Believe me, it’s going take multiple attempts if you are doing it for the first time.

Let’s have a look at the Bill of Materials (BOM) for Cloud Foundation version 5.0.0.0 Build 21822418.

Software ComponentVersion
Cloud Builder VM5.0-21822418
SDDC Manager5.0-21822418
VMware vCenter Server Appliance8.0 U1a -21815093
VMware ESXi8.0 U1a -21813344
VMware NSX-T Data Center4.1.0.2.0-21761691
Aria Suite Lifecycle8.10 Patch 1 -21331275

It’s always a good idea to check release notes of the product before you design & deploy. You can find the release notes here.

https://docs.vmware.com/en/VMware-Cloud-Foundation/5.0/rn/vmware-cloud-foundation-50-release-notes/index.html

Some of the content of this blog has been copied from my previous blog (VMware vCloud Foundation 4.2.1 Step by Step) since it matches with version 5.0 too.

Let’s discuss and understand the high level installation flow,

Configure TOR for the networks that are being used by VCF. In our case, we have VyOS router.
Deploy a Cloud Builder VM on standalone source physical ESXi.
Install and Configure 4 ESXi Servers as per the pre-requisites.
Fill in the “Deployment Parameters” excel sheet carefully.
Upload “Deployment Parameter” excel sheet to Cloud Builder.
Resolve the issues / warning shown on the validation page of CB.
Start the deployment.
Post deployment, you will have a vCenter, 4 ESXi servers, 3 NSX managers & SDDC manager deployed.
Additionally, you can deploy VI workload domain using SDDC manager. This will allow you to deploy Kubernetes cluster and vRealize Suite components.

You definitely need huge amount of compute resources to deploy this solution.

This entire solution was installed on a single physical ESXi server. Following is the configuration of the server.

HP ProLiant DL360 Gen9
2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
512 GB Memory
4 TB SSD

Am sure it is possible in 256 gigs of memory too.

Let’s prepare the infra for VCF lab.

I will call my physical esxi server as a base esxi in this blog.
So, here is my base esxi and VM’s installed on it.

VyOS – This virtual router will act as a TOR for VCF env.
dc.virtaulrove.local – This is a Domain Controller & DNS Server in the env.
jumpbox.virtaulrove.local – To connect to the env.
vcf173 to vcf176 – These will be the target ESXi’s for our VCF deployment.
cb.virtaulrove.local – Cloud Builder VM to deploy VCF.

Here is a look at the TOR and interfaces configured…

Follow my blog here to configure the VyOS TOR.

Network Requirements: Management domain networks to be in place on physical switch (TOR). Jumbo frames (MTU 9000) are recommended on all VLANs or minimum of 1600 MTU.

Following DNS records to be in place before we start with the installation.

Cloud Builder Deployment:

Cloud Builder is an appliance provided by VMware to build VCF env on target ESXi’s. It is a one time use VM and can be powered off after the successful deployment of VCF management domain. After the deployment, we will use SDDC manager for managing additional VI domains. I will be deploying this appliance in VLAN 1631, so that it gets access to DC and all our target ESXi servers.

Download the correct CB ova from the downloads,

We also need excel sheet to downloaded from the same page.

‘Cloud Builder Deployment Parameter Guide’

This is a deployment parameter sheet used by CB to deploy VCF infrastructure.

Deployment is straight forward like any other ova deployment. Make sure to you choose right password while deploying the ova. The admin & root password must be a minimum of 8 characters and include at least one uppercase, one lowercase, one digit, and one special character. If this does not meet, then the deployment will fail which results in re-deploying ova.

Nested ESXi Installation & Prereqs
With all these things in place, our next step is to deploy 4 nested ESXi servers on our physical ESXi host. These will be our target hosts for VCF deployment. Download the correct supported esxi version ISO from VMware downloads.

All ESXi should have an identical configuration. I have following configuration in my lab.

vCPU: 12
2 Sockets, 6 cores each.
CPU hot plug: Enabled
Hardware Virtualization: Enabled
HDD1: Thick: ESXi OS installation
HDD2: Thin VSAN Cache Tier
HDD3: Thin VSAN Capacity Tier
HDD4: Thin VSAN Capacity Tier

And 2 network cards attached to Trunk_4095. This will allow an esxi to communicate with all networks on the TOR.

Map the ISO to CD drive and start the installation.

I am not going to show ESXi installation steps, since it is available online in multiple blogs. Let’s look at the custom settings after the installation.

DCUI VLAN settings should be set to 1631.

IPv4 Config

DNS Config

And finally, make sure that the ‘Test Management Network’ on DCUI shows OK for all tests.

Repeat this for all 4 nested esxi.

I have all my 4 target esxi severs ready. Let’s look at the ESXi configuration that has to be in place before we can utilize them for VCF deployment.

All ESXi must have ‘VM network’ and ‘Management network’ VLAN id 1631 configured.
NTP server address configured on all ESXi.

SSH & NTP service to be enabled and policy set to ‘Start & Stop with the host’

All additional disks to be present on an ESXi as a SSD and ready for VSAN configuration. You can check it here.

If your base ESXi has HDD and not SSD, then you can use following command to mark those HDD to SSD.

You can either connect to DC and putty to ESXi or open ESXi console and run these commands.

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T1:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T2:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T3:L0 -o enable_ssd
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T1:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T2:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T3:L0

Once done, run ‘esxcli storage core device list’ command and verify if you see SSD instead of HDD.

Well, that should complete all our pre-requisites for target esxi’s.

Till now, we have completed configuration of Domain controller, VyoS router, 4 nested target ESXi & Cloud Builder ova deployment. Following VM’s have been created on my physical ESXi host.

I will see you in the next post. Will discuss about “Deployment Parameters” excel sheet in detail.
Hope that the information in the blog is helpful. Thank you.

Are you looking out for a lab to practice VMware products…? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VMware vCloud Foundation 4.2.1 Step by Step Phase3 – Deployment

Welcome back. We have covered the background work as well as deployment parameter sheet in earlier posts. If you missed it, you can find it here…

VMware vCloud Foundation 4.2.1 Step by Step Phase1 – Preparation
VMware vCloud Foundation 4.2.1 Step by Step Phase2 – Cloud Builder & Deployment Parameters

Its time to start the actual deployment. We will resolve the issues as we move on.
Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.

Upload the file and Next.  I got an error here.

Bad Request: Invalid input
DNS Domain must match

Figured out to be an additional space in DNS Zone Name here.

This was corrected. Updated the sheet and NEXT.

All good. Validation process started.

To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.

Let’s continue with the deployment…

Next Error.

“Error connecting to ESXi host esxi01. SSL Certificate common name doesn’t match ESXi FQDN”

Look at the “vcf-bringup.log” file.

This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates

You can see here, Even if the hostname on the top shows “esxi01.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.

SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.

esxcli system hostname set -H=esxi03
esxcli system hostname set -f=esxi03.virtualrove.local
cd /etc/vmware/ssl
/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart
Reboot

You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.

Verify the hostname in the cert once server boots up.

Next, Hit retry on cloud builder, and we should be good.

I am not sure why this showed up. I was able to reach to these IP’s from “Cloud Builder”.

 

Anyways, this was warning, and it can be ignored.

Next one was with host tep and edge tep.

VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi02.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) on host ‘esxi02.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi02.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) from host ‘esxi02.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed

VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) on host ‘esxi03.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) on host ‘esxi03.virtualrove.local’ failed
VM Kernel ping from IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) from host ‘esxi03.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) from host ‘esxi03.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed

First of all, I failed to understand APIPA 169.254.X.X. We had mentioned VLAN 1634 for Host TEP. It should have picked an ip address 172.16.34.X. This VLAN was already in place on TOR and I was able to ping the GW of it from CB. I took a chance here and ignored it since it was a warning.

Next, got warnings for NTP.

Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local
NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds
Host esxi01.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local

For ESXi, Restart of ntpd.service resolved issue.
For CB, I had to sync the time manually.

Steps to manually sync NTP…

ntpq -p
systemctl stop ntpd.service
ntpdate 172.16.31.110
Wait for a min and again run this
ntpdate 172.16.31.110
systemctl start ntpd.service
systemctl restart ntpd.service
ntpq -p

verify the offset again. It must be closer to 0.

Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊

This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing.
Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.

Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.

Next, You will get this window once you resolve all errors.

Click on “Deploy SDDC”.

Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.

Retry Failed Bringup with Modified Input Spec in VCF

So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊

Let the game begin…

Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.

Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.

Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.

All looks good here. Till this point I had vCenter in place and it was deploying first NSX-T ova.

Looks great.

Glance at the NSX-T env.

Note that the TEP ip’s for host are from the vlan 1634. However, CB validation stage was picking up apipa.

NSX-T was fine. It moved to SDDC further.

Woo, Bring-up moved to post deployment task.

Moved to AVN (Application Virtual Networking). I am expecting some errors here.

Failed.

“A problem has occurred on the server. Please retry or contact the service provider and provide the reference token. Unable to create logical tier-1 gateway (0)”

This was easy one. vcf-bringup.log showed that it was due to missing DNS record for edge vm. Created DNS record and retry.

Next one,

“Failed to validate BGP Neighbor Perring Status for edge node 172.16.31.125”

Let’s look at the log file.

Time to check NSX-T env.

Tier-0 gateway Interfaces looks good as per out deployment parameters.

However, BGP Neighbors are down.

This was expected since we haven’t done the BGP configuration on TOR (VyOS) yet. Let’s get in to VyOS and run some commands.

set protocols bgp 65001 parameters router-id 172.27.11.253
This command specifies the router-ID. If router ID is not specified it will use the highest interface IP address.

set protocols bgp 65001 neighbor 172.27.11.2 update-source eth4
Specify the IPv4 source address to use for the BGP session to this neighbor, may be specified as either an IPv4 address directly or as an interface name.

set protocols bgp 65001 neighbor 172.27.11.2 remote-as ‘65003’
This command creates a new neighbor whose remote-as is <nasn>. The neighbor address can be an IPv4 address or an IPv6 address or an interface to use for the connection. The command is applicable for peer and peer group.

set protocols bgp 65001 neighbor 172.27.11.3 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.11.2 password VMw@re1!
set protocols bgp 65001 neighbor 172.27.11.3 password VMw@re1!

Commit
Save

TOR configuration done for 2711 vlan. Let’s refresh and check the bgp status in nsx-t.

Looks good.

Same configuration to be performed for 2nd VLAN. I am using same VyOS for both the vlans since it’s a lab env. Usually, You will have 2 TOR’s and each BGP peer vlan configured respectively for redundancy purpose.

set protocols bgp 65001 parameters router-id 172.27.12.253
set protocols bgp 65001 neighbor 172.27.12.2 update-source eth5
set protocols bgp 65001 neighbor 172.27.12.2 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.12.3 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.12.2 password VMw@re1!
set protocols bgp 65001 neighbor 172.27.12.3 password VMw@re1!

Both BGP Neighbors are successful.

Hit Retry on CB and it should pass that phase.

Next Error on Cloud Builder: ‘Failed to validate BGP route distribution.’

Log File.

At this stage, routing has been configured in your NSX-T environment, both edges have been deployed and BGP peering has been done. If you check bgp peer information on edge as well as VyOS router, it will show ‘established’ and even routes from NSX-T environment appears on your VyOS router. Which means, route redistribution from NSX to VyOS works fine and this error means that there are no routes advertised from VyOS (TOR) to NSX environment. Let’s get into VyOS and run some commands.

set protocols bgp 65001 address-family ipv4-unicast network 172.16.31.0/24
set protocols bgp 65001 address-family ipv4-unicast network 172.16.32.0/24

Retry on CB and you should be good.

Everything went smoothly after this. SDDC was deployed successfully.

That was fun. We have successfully deployed vCloud Foundation version 4.2.1 including AVN (Application Virtual Networking).

Time to verify and check the components that have been installed.

SDDC Manager.

Segments in NSX-T which was specified in deployment parameters sheet.

Verify on the TOR (VyOS) if you see these segments as BGP published networks.

Added a test segment called “virtaulrove_overlay_172.16.50.0” in nsx-t to check if the newly created network gets published to TOR.

All looks good. I see the new segment subnet populated on TOR.

Let’s do some testing. As you see above, new segment subnets are being learned from 172.27.11.2 this interface is configured on edge01 VM. Check it here.

We will take down edge01 VM to see if route learning changes to edge02.

Get into nodes on nsx-t and “Enter NSX Maintenance mode” for edge 01 VM.

Edge01, Tunnels & Status down.

Notice that the gateway address has been failed over to 172.27.11.3.

All Fine, All Good. 😊

There are multiple tests that can be performed to check if the deployed environment is redundant at every level.

Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.

For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.

You should have enjoyed reading the post. It’s time for you to get started and deploy VCF. See you in future posts. Feel free to comment below if you face any issues when you deploy the VCF environment.

Are you looking out for a lab to practice VMware products..? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VMware vCloud Foundation 4.2.1 Step by Step Phase2 – Cloud Builder & Deployment Parameters

We have prepared the environment for VCF deployment. Its time to move to CB and discuss the “Deployment Parameters” excel sheet in detail. You can find my earlier blog here.

Login to Cloud Builder VM and start the deployment process.

Select “vCloud Foundation” here,

The other option “Dell EMC VxRail” to be used when your physical hardware vendor is Dell.

VxRail is hyper-converged appliance. It’s a single device which includes compute, storage, networking and virtualization resources. It comes with pre-configured vCenter and esxi servers. Then there is a manual process to convert this embedded vCenter into user manage vCenter, and that’s when we use this option. If possible, I will write a small blog on it too.

Read all prereqs on this page and make sure to fulfill them before you proceed.

Click on “Download” here to get the “Deployment Parameter” excel sheet.

Let’s dig into this sheet and talk in detail about all the parameters here.

“Prerequisites Checklist” sheet from the deployment parameter. Check all line items one by one and select “Verified” in the status column. This does not affect anywhere; it is just for your reference.

“Management Workloads” sheet.

Place your license keys here.

This sheet also has compute resource calculator for management workload domain. Have a look and try to fit your requirements accordingly.

“Users and Groups”: Define all passwords here. Check out the NSX-T passwords, as the validation fails if it does not match the password policy.

Moving on to next sheet “Hosts and Networks”.

Couple of things to discuss here,

DHCP requirement for NSX-T Host TEP is optional now. It can be defined manually with static IP pools here. However, if you select NO, then DHCP option is still valid.

Moving onto “vSphere Distributed Switch Profile” in this sheet. It has 3 profiles. Earlier VCF version had only one option to deploy with 2 pnics only. Due to high demand from customer to deploy with 4 pnics, this option was introduced. Let’s talk about this option.

Profile-1

This profile will deploy a single vDS with 2 or 4 uplinks. All network traffic will flow through the assigned nics in this vDS. Define the name and pNICs at row # 17,18 respectively.

Profile-2

This one deploys 2 VDS. You can see that the first vDS will carry management traffic and the other one is for NSX. Each vDS can have 2 or 4 pnics.

Profile-3

This one also deploys 2 vDS, just that the VSAN traffic is segregated instead of NSX in earlier case.

Select the profile as per your business requirement and move to next step.

Next – “Deploy Parameters”

Define all parameters here carefully. If something is not good, the cell would turn RED. I have selected VCSA size as small since we are testing the product.

Move to NSX-T section. Have a look at the AVN (Application Virtual Networking). If you select Yes here, then you must specify the BGP peering information and uplinks configuration. If it’s NO, then it does not do BGP peering.

TOR1 & TOR2 IPs interfaces configured on your VyOS. Make sure to create those interfaces. We will see it in detail when we reach to that level in the deployment phase.

We are all set to upload this “Deployment Parameter” sheet to Cloud Builder and begin the deployment. That is all for this blog. We will do the actual deployment in next blog.

Are you looking out for a lab to practice VMware products..? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

VMware vCloud Foundation 4.2.1 Step by Step Phase1 – Preparation

Finally, after a year and half, I got a chance to deploy latest version of vCloud Foundation 4.2.1. It has been successfully deployed and tested. I have written couple blogs on earlier version (i.e. version 4.0), you can find them here.

https://virtualrove.com/vcf/

Let’s have a look at the Cloud Foundation 4.2.1 Bill of Materials (BOM).

Software ComponentVersionDateBuild Number
Cloud Builder VM4.2.125-May-2118016307
SDDC Manager4.2.125-May-2118016307
VMware vCenter Server Appliance7.0.1.0030125-May-2117956102
VMware ESXi7.0 Update 1d4-Feb-2117551050*
VMware NSX-T Data Center3.1.217-Apr-2117883596
VMware vRealize Suite Lifecycle Manager8.2 Patch 24-Feb-2117513665
Workspace ONE Access3.3.44-Feb-2117498518
vRealize Automation8.26-Oct-2016980951
vRealize Log Insight8.26-Oct-2016957702
vRealize Operations Manager8.26-Oct-2016949153

It’s always a good idea to check release notes of the product before you design & deploy. You can find the release notes here. https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2.1/rn/VMware-Cloud-Foundation-421-Release-Notes.html

Let’s discuss and understand the installation flow,

Configure TOR for the networks that are being used by VCF. In our case, we have VyOS router.
Deploy a Cloud Builder VM on stand alone source ESXi or vCenter.
Install and Configure 4 ESXi Servers as per the pre-reques.
Fill the Deployment Parameters excel sheet carefully.
Upload “Deployment Parameter” excel sheet to Cloud Builder.
Resolve the issues / warning shown on the validation page of CB.
Start the deployment.
Post deployment, you will have a vCenter, 4 ESXi servers, NSX-T env & SDDC manager deployed.
Additionally, you can deploy VI workload domain using SDDC manager. This will allow you to deploy Kubernetes cluster.
Also, vRealize Suite & Workspace ONE can be deployed using SDDC manager.

You definitely need huge amount of compute resources to deploy this solution.
This entire solution was installed on a single ESXi server. Following is the configuration of the server.

Dell PowerEdge R630
2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
256 GB Memory
4 TB SSD

Let’s prepare the infra for VMware vCloud Foundation.

I will call my physical esxi server as a base esxi in this blog.
So, here is my base esxi and VM’s installed on it.

dc.virtaulrove.local – This is a Domain Controller & DNS Server in the env.
VyOS – This virtual router will act as a TOR for VCF env.
jumpbox.virtaulrove.local – To connect to the env.
ESXi01 to ESXi 04 – These will be the target ESXi’s for our VCF deployment.
cb.virtaulrove.local – Cloud Builder VM to deploy VCF.

Here is a look at the TOR and interfaces configured…

Follow my blog here to configure the VyOS TOR.

Network Requirements: Management domain networks to be in place on physical switch (TOR). Jumbo frames (MTU 9000) are recommended on all VLANs or minimum of 1600 MTU.

And a VLAN 1634 for Host TEP’s, which is already configured on TOR at eth3.

Following DNS records to be in place before we start with the installation.

With all these things in place, out first step is to deploy 4 target ESXi servers. Download the correct supported esxi version ISO from VMware downloads.

VMware ESXi7.0 Update 1d4-Feb-2117551050*

If you check VMware downloads page, this version is not available for download.

Release notes says, create a custom image to use it for deployment. However, there is another way to download this version of ESXi image. Let’s get the Cloud Builder image from VMware portal and install it. We will keep ESXi installation on hold for now.

We start the Cloud Builder deployment once this 19 GB ova file is downloaded.

Cloud Builder Deployment:

Cloud Builder is an appliance provided by VMware to build VCF env on target ESXi’s. It is one time use VM and can be powered off after the successful deployment of VCF management domain. After deployment, we will use SDDC manager for managing additional VI domains. I will be deploying this appliance in VLAN 1631, so that it gets access to DC and all our target ESXi servers.

Deployment is straight forward like any other ova deployment. Make sure to you choose right password while deploying the ova. The admin & root password must be a minimum of 8 characters and include at least one uppercase, one lowercase, one digit, and one special character. If this does not meet, then the deployment will fail which results in re-deploying ova.

Once the deployment is complete. Connect to CB using winscp and navigate to ….

/mnt/iso/sddc-foundation-bundle-4.2.1.0-18016307/esx_iso/

You should see an ESXi image at this path.

Click on Download to use this image to deploy our 4 target ESXi servers.

Next step is to create 4 new VM’s on base physical ESXi. These will be our nested ESXi where our VCF env will get install. All ESXi should have identical configuration. I have following configuration in my lab.

vCPU: 12
2 Sockets, 6 cores each.
CPU hot plug: Enabled
Hardware Virtualization: Enabled

Memory: 56 GB

HDD1: Thick: ESXi OS installation
HDD2: Thin VSAN Cache Tier
HDD3: Thin VSAN Capacity Tier
HDD4: Thin VSAN Capacity Tier

And 2 network cards attached to Trun_4095. This will allow an esxi to communicate with all networks on the TOR.

Map the ISO to CD drive and start the installation.

I am not going to show ESXi installation steps, since most of you know it already. Let’s look at the custom settings after the installation.

DCUI VLAN settings should be set to 1631.

Crosscheck the DNS and IP settings on esxi.

And finally, make sure that the ‘Test Management Network’ on DCUI shows OK for all tests.

Repeat this for all 4 esxi.

I have all my 4 target esxi severs ready. Let’s look at the ESXi configuration that has to be in place before we can utilize them for VCF deployment.

All ESXi must have ‘VM network’ and ‘Management network’ VLAN id 1631 configured.
NTP server address should be in place on all ESXi.
SSH & NTP service to be enabled and policy set to ‘Start & Stop with the host’
All additional disks to be present on an ESXi as a SSD and ready for VSAN configuration. You can check it here.

If your base ESXi has HDD and not SSD, then you can use following command to mark those HDD to SSD.

You can either connect to DC and putty to ESXi or open ESXi console and run these commands.

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T1:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T2:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T3:L0 -o enable_ssd
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T1:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T2:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T3:L0

Once done, run ‘esxcli storage core device list’ command and verify if you see SSD instead of HDD.

Well, that should complete all our requisites for target esxi’s.

Till now, we have completed configuration of Domain controller, VyoS router, 4 nested target ESXi & Cloud Builder ova deployment. Following VM’s have been created on my physical ESXi host.

I will see you in next post, where we talk about “Deployment Parameters” excel sheet in detail.

Thank you.

Are you looking out for a lab to practice VMware products..? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.