VMware vCloud Foundation 4.2.1 Step by Step Phase3 – Deployment

Welcome back. We have covered the background work as well as deployment parameter sheet in earlier posts. If you missed it, you can find it here…

VMware vCloud Foundation 4.2.1 Step by Step Phase1 – Preparation
VMware vCloud Foundation 4.2.1 Step by Step Phase2 – Cloud Builder & Deployment Parameters

Its time to start the actual deployment. We will resolve the issues as we move on.
Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.

Upload the file and Next.  I got an error here.

Bad Request: Invalid input
DNS Domain must match

Figured out to be an additional space in DNS Zone Name here.

This was corrected. Updated the sheet and NEXT.

All good. Validation process started.

To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.

Let’s continue with the deployment…

Next Error.

“Error connecting to ESXi host esxi01. SSL Certificate common name doesn’t match ESXi FQDN”

Look at the “vcf-bringup.log” file.

This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates

You can see here, Even if the hostname on the top shows “esxi01.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.

SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.

esxcli system hostname set -H=esxi03
esxcli system hostname set -f=esxi03.virtualrove.local
cd /etc/vmware/ssl
/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart
Reboot

You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.

Verify the hostname in the cert once server boots up.

Next, Hit retry on cloud builder, and we should be good.

I am not sure why this showed up. I was able to reach to these IP’s from “Cloud Builder”.

 

Anyways, this was warning, and it can be ignored.

Next one was with host tep and edge tep.

VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi02.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) on host ‘esxi02.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi02.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) from host ‘esxi02.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed

VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) on host ‘esxi03.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) on host ‘esxi03.virtualrove.local’ failed
VM Kernel ping from IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) from host ‘esxi03.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) from host ‘esxi03.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed

First of all, I failed to understand APIPA 169.254.X.X. We had mentioned VLAN 1634 for Host TEP. It should have picked an ip address 172.16.34.X. This VLAN was already in place on TOR and I was able to ping the GW of it from CB. I took a chance here and ignored it since it was a warning.

Next, got warnings for NTP.

Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local
NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds
Host esxi01.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local

For ESXi, Restart of ntpd.service resolved issue.
For CB, I had to sync the time manually.

Steps to manually sync NTP…

ntpq -p
systemctl stop ntpd.service
ntpdate 172.16.31.110
Wait for a min and again run this
ntpdate 172.16.31.110
systemctl start ntpd.service
systemctl restart ntpd.service
ntpq -p

verify the offset again. It must be closer to 0.

Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊

This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing.
Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.

Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.

Next, You will get this window once you resolve all errors.

Click on “Deploy SDDC”.

Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.

Retry Failed Bringup with Modified Input Spec in VCF

So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊

Let the game begin…

Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.

Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.

Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.

All looks good here. Till this point I had vCenter in place and it was deploying first NSX-T ova.

Looks great.

Glance at the NSX-T env.

Note that the TEP ip’s for host are from the vlan 1634. However, CB validation stage was picking up apipa.

NSX-T was fine. It moved to SDDC further.

Woo, Bring-up moved to post deployment task.

Moved to AVN (Application Virtual Networking). I am expecting some errors here.

Failed.

“A problem has occurred on the server. Please retry or contact the service provider and provide the reference token. Unable to create logical tier-1 gateway (0)”

This was easy one. vcf-bringup.log showed that it was due to missing DNS record for edge vm. Created DNS record and retry.

Next one,

“Failed to validate BGP Neighbor Perring Status for edge node 172.16.31.125”

Let’s look at the log file.

Time to check NSX-T env.

Tier-0 gateway Interfaces looks good as per out deployment parameters.

However, BGP Neighbors are down.

This was expected since we haven’t done the BGP configuration on TOR (VyOS) yet. Let’s get in to VyOS and run some commands.

set protocols bgp 65001 parameters router-id 172.27.11.253
This command specifies the router-ID. If router ID is not specified it will use the highest interface IP address.

set protocols bgp 65001 neighbor 172.27.11.2 update-source eth4
Specify the IPv4 source address to use for the BGP session to this neighbor, may be specified as either an IPv4 address directly or as an interface name.

set protocols bgp 65001 neighbor 172.27.11.2 remote-as ‘65003’
This command creates a new neighbor whose remote-as is <nasn>. The neighbor address can be an IPv4 address or an IPv6 address or an interface to use for the connection. The command is applicable for peer and peer group.

set protocols bgp 65001 neighbor 172.27.11.3 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.11.2 password VMw@re1!
set protocols bgp 65001 neighbor 172.27.11.3 password VMw@re1!

Commit
Save

TOR configuration done for 2711 vlan. Let’s refresh and check the bgp status in nsx-t.

Looks good.

Same configuration to be performed for 2nd VLAN. I am using same VyOS for both the vlans since it’s a lab env. Usually, You will have 2 TOR’s and each BGP peer vlan configured respectively for redundancy purpose.

set protocols bgp 65001 parameters router-id 172.27.12.253
set protocols bgp 65001 neighbor 172.27.12.2 update-source eth5
set protocols bgp 65001 neighbor 172.27.12.2 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.12.3 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.12.2 password VMw@re1!
set protocols bgp 65001 neighbor 172.27.12.3 password VMw@re1!

Both BGP Neighbors are successful.

Hit Retry on CB and it should pass that phase.

Next Error on Cloud Builder: ‘Failed to validate BGP route distribution.’

Log File.

At this stage, routing has been configured in your NSX-T environment, both edges have been deployed and BGP peering has been done. If you check bgp peer information on edge as well as VyOS router, it will show ‘established’ and even routes from NSX-T environment appears on your VyOS router. Which means, route redistribution from NSX to VyOS works fine and this error means that there are no routes advertised from VyOS (TOR) to NSX environment. Let’s get into VyOS and run some commands.

set protocols bgp 65001 address-family ipv4-unicast network 172.16.31.0/24
set protocols bgp 65001 address-family ipv4-unicast network 172.16.32.0/24

Retry on CB and you should be good.

Everything went smoothly after this. SDDC was deployed successfully.

That was fun. We have successfully deployed vCloud Foundation version 4.2.1 including AVN (Application Virtual Networking).

Time to verify and check the components that have been installed.

SDDC Manager.

Segments in NSX-T which was specified in deployment parameters sheet.

Verify on the TOR (VyOS) if you see these segments as BGP published networks.

Added a test segment called “virtaulrove_overlay_172.16.50.0” in nsx-t to check if the newly created network gets published to TOR.

All looks good. I see the new segment subnet populated on TOR.

Let’s do some testing. As you see above, new segment subnets are being learned from 172.27.11.2 this interface is configured on edge01 VM. Check it here.

We will take down edge01 VM to see if route learning changes to edge02.

Get into nodes on nsx-t and “Enter NSX Maintenance mode” for edge 01 VM.

Edge01, Tunnels & Status down.

Notice that the gateway address has been failed over to 172.27.11.3.

All Fine, All Good. 😊

There are multiple tests that can be performed to check if the deployed environment is redundant at every level.

Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.

For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.

You should have enjoyed reading the post. It’s time for you to get started and deploy VCF. See you in future posts. Feel free to comment below if you face any issues when you deploy the VCF environment.

Are you looking out for a lab to practice VMware products..? If yes, then click on the below link to know more.

Leave your email address in the box below to receive notification on my new blogs.

VMware vCloud Foundation 4.2.1 Step by Step Phase2 – Cloud Builder & Deployment Parameters

We have prepared the environment for VCF deployment. Its time to move to CB and discuss the “Deployment Parameters” excel sheet in detail. You can find my earlier blog here.

Login to Cloud Builder VM and start the deployment process.

Select “vCloud Foundation” here,

The other option “Dell EMC VxRail” to be used when your physical hardware vendor is Dell.

VxRail is hyper-converged appliance. It’s a single device which includes compute, storage, networking and virtualization resources. It comes with pre-configured vCenter and esxi servers. Then there is a manual process to convert this embedded vCenter into user manage vCenter, and that’s when we use this option. If possible, I will write a small blog on it too.

Read all prereqs on this page and make sure to fulfill them before you proceed.

Click on “Download” here to get the “Deployment Parameter” excel sheet.

Let’s dig into this sheet and talk in detail about all the parameters here.

“Prerequisites Checklist” sheet from the deployment parameter. Check all line items one by one and select “Verified” in the status column. This does not affect anywhere; it is just for your reference.

“Management Workloads” sheet.

Place your license keys here.

This sheet also has compute resource calculator for management workload domain. Have a look and try to fit your requirements accordingly.

“Users and Groups”: Define all passwords here. Check out the NSX-T passwords, as the validation fails if it does not match the password policy.

Moving on to next sheet “Hosts and Networks”.

Couple of things to discuss here,

DHCP requirement for NSX-T Host TEP is optional now. It can be defined manually with static IP pools here. However, if you select NO, then DHCP option is still valid.

Moving onto “vSphere Distributed Switch Profile” in this sheet. It has 3 profiles. Earlier VCF version had only one option to deploy with 2 pnics only. Due to high demand from customer to deploy with 4 pnics, this option was introduced. Let’s talk about this option.

Profile-1

This profile will deploy a single vDS with 2 or 4 uplinks. All network traffic will flow through the assigned nics in this vDS. Define the name and pNICs at row # 17,18 respectively.

Profile-2

This one deploys 2 VDS. You can see that the first vDS will carry management traffic and the other one is for NSX. Each vDS can have 2 or 4 pnics.

Profile-3

This one also deploys 2 vDS, just that the VSAN traffic is segregated instead of NSX in earlier case.

Select the profile as per your business requirement and move to next step.

Next – “Deploy Parameters”

Define all parameters here carefully. If something is not good, the cell would turn RED. I have selected VCSA size as small since we are testing the product.

Move to NSX-T section. Have a look at the AVN (Application Virtual Networking). If you select Yes here, then you must specify the BGP peering information and uplinks configuration. If it’s NO, then it does not do BGP peering.

TOR1 & TOR2 IPs interfaces configured on your VyOS. Make sure to create those interfaces. We will see it in detail when we reach to that level in the deployment phase.

We are all set to upload this “Deployment Parameter” sheet to Cloud Builder and begin the deployment. That is all for this blog. We will do the actual deployment in next blog.

Are you looking out for a lab to practice VMware products..? If yes, then click on the below link to know more.

Leave your email address in the box below to receive notification on my new blogs.

VMware vCloud Foundation 4.2.1 Step by Step Phase1 – Preparation

Finally, after a year and half, I got a chance to deploy latest version of vCloud Foundation 4.2.1. It has been successfully deployed and tested. I have written couple blogs on earlier version (i.e. version 4.0), you can find them here.

https://virtualrove.com/vcf/

Let’s have a look at the Cloud Foundation 4.2.1 Bill of Materials (BOM).

Software ComponentVersionDateBuild Number
Cloud Builder VM4.2.125-May-2118016307
SDDC Manager4.2.125-May-2118016307
VMware vCenter Server Appliance7.0.1.0030125-May-2117956102
VMware ESXi7.0 Update 1d4-Feb-2117551050*
VMware NSX-T Data Center3.1.217-Apr-2117883596
VMware vRealize Suite Lifecycle Manager8.2 Patch 24-Feb-2117513665
Workspace ONE Access3.3.44-Feb-2117498518
vRealize Automation8.26-Oct-2016980951
vRealize Log Insight8.26-Oct-2016957702
vRealize Operations Manager8.26-Oct-2016949153

It’s always a good idea to check release notes of the product before you design & deploy. You can find the release notes here. https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2.1/rn/VMware-Cloud-Foundation-421-Release-Notes.html

Let’s discuss and understand the installation flow,

Configure TOR for the networks that are being used by VCF. In our case, we have VyOS router.
Deploy a Cloud Builder VM on stand alone source ESXi or vCenter.
Install and Configure 4 ESXi Servers as per the pre-reques.
Fill the Deployment Parameters excel sheet carefully.
Upload “Deployment Parameter” excel sheet to Cloud Builder.
Resolve the issues / warning shown on the validation page of CB.
Start the deployment.
Post deployment, you will have a vCenter, 4 ESXi servers, NSX-T env & SDDC manager deployed.
Additionally, you can deploy VI workload domain using SDDC manager. This will allow you to deploy Kubernetes cluster.
Also, vRealize Suite & Workspace ONE can be deployed using SDDC manager.

You definitely need huge amount of compute resources to deploy this solution.
This entire solution was installed on a single ESXi server. Following is the configuration of the server.

Dell PowerEdge R630
2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
256 GB Memory
4 TB SSD

Let’s prepare the infra for VMware vCloud Foundation.

I will call my physical esxi server as a base esxi in this blog.
So, here is my base esxi and VM’s installed on it.

dc.virtaulrove.local – This is a Domain Controller & DNS Server in the env.
VyOS – This virtual router will act as a TOR for VCF env.
jumpbox.virtaulrove.local – To connect to the env.
ESXi01 to ESXi 04 – These will be the target ESXi’s for our VCF deployment.
cb.virtaulrove.local – Cloud Builder VM to deploy VCF.

Here is a look at the TOR and interfaces configured…

Follow my blog here to configure the VyOS TOR.

Network Requirements: Management domain networks to be in place on physical switch (TOR). Jumbo frames (MTU 9000) are recommended on all VLANs or minimum of 1600 MTU.

And a VLAN 1634 for Host TEP’s, which is already configured on TOR at eth3.

Following DNS records to be in place before we start with the installation.

With all these things in place, out first step is to deploy 4 target ESXi servers. Download the correct supported esxi version ISO from VMware downloads.

VMware ESXi7.0 Update 1d4-Feb-2117551050*

If you check VMware downloads page, this version is not available for download.

Release notes says, create a custom image to use it for deployment. However, there is another way to download this version of ESXi image. Let’s get the Cloud Builder image from VMware portal and install it. We will keep ESXi installation on hold for now.

We start the Cloud Builder deployment once this 19 GB ova file is downloaded.

Cloud Builder Deployment:

Cloud Builder is an appliance provided by VMware to build VCF env on target ESXi’s. It is one time use VM and can be powered off after the successful deployment of VCF management domain. After deployment, we will use SDDC manager for managing additional VI domains. I will be deploying this appliance in VLAN 1631, so that it gets access to DC and all our target ESXi servers.

Deployment is straight forward like any other ova deployment. Make sure to you choose right password while deploying the ova. The admin & root password must be a minimum of 8 characters and include at least one uppercase, one lowercase, one digit, and one special character. If this does not meet, then the deployment will fail which results in re-deploying ova.

Once the deployment is complete. Connect to CB using winscp and navigate to ….

/mnt/iso/sddc-foundation-bundle-4.2.1.0-18016307/esx_iso/

You should see an ESXi image at this path.

Click on Download to use this image to deploy our 4 target ESXi servers.

Next step is to create 4 new VM’s on base physical ESXi. These will be our nested ESXi where our VCF env will get install. All ESXi should have identical configuration. I have following configuration in my lab.

vCPU: 12
2 Sockets, 6 cores each.
CPU hot plug: Enabled
Hardware Virtualization: Enabled

Memory: 56 GB

HDD1: Thick: ESXi OS installation
HDD2: Thin VSAN Cache Tier
HDD3: Thin VSAN Capacity Tier
HDD4: Thin VSAN Capacity Tier

And 2 network cards attached to Trun_4095. This will allow an esxi to communicate with all networks on the TOR.

Map the ISO to CD drive and start the installation.

I am not going to show ESXi installation steps, since most of you know it already. Let’s look at the custom settings after the installation.

DCUI VLAN settings should be set to 1631.

Crosscheck the DNS and IP settings on esxi.

And finally, make sure that the ‘Test Management Network’ on DCUI shows OK for all tests.

Repeat this for all 4 esxi.

I have all my 4 target esxi severs ready. Let’s look at the ESXi configuration that has to be in place before we can utilize them for VCF deployment.

All ESXi must have ‘VM network’ and ‘Management network’ VLAN id 1631 configured.
NTP server address should be in place on all ESXi.
SSH & NTP service to be enabled and policy set to ‘Start & Stop with the host’
All additional disks to be present on an ESXi as a SSD and ready for VSAN configuration. You can check it here.

If your base ESXi has HDD and not SSD, then you can use following command to mark those HDD to SSD.

You can either connect to DC and putty to ESXi or open ESXi console and run these commands.

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T1:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T2:L0 -o enable_ssd
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d mpx.vmhba1:C0:T3:L0 -o enable_ssd
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T1:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T2:L0
esxcli storage core claiming reclaim -d mpx.vmhba1:C0:T3:L0

Once done, run ‘esxcli storage core device list’ command and verify if you see SSD instead of HDD.

Well, that should complete all our requisites for target esxi’s.

Till now, we have completed configuration of Domain controller, VyoS router, 4 nested target ESXi & Cloud Builder ova deployment. Following VM’s have been created on my physical ESXi host.

I will see you in next post, where we talk about “Deployment Parameters” excel sheet in detail.

Thank you.

Are you looking out for a lab to practice VMware products..? If yes, then click on the below link to know more.

Leave your email address in the box below to receive notification on my new blogs.