It’s time to check VCF environment and do some post checks. Here is the SDDC manager after the deployment,
Host & Clusters view,
VM’s & Templates,
Datastore,
And Networking,
Let’s look at the NSX env,
All management hosts have been prepared for nsx,
Host configuration on one of the host in this cluster, “vcf-vds01” configured for NSX. TZ, Uplink profile & IP pool created and configured already.
vCenter Virtual switch view on one of the host,
NSX already have backup configured, And the last backup was successful.
If you look at the backup config, it has configured sddc as a backup server,
Lets have a look at the SDDC manager dashboard,
Host view on SDDC shows as expected,
Workload Domain view shows our management domain,
Click on the management domain name to check details,
Host tab on under the management domain shows host details again,
Edge clusters are empty. You get an option to deploy edge clusters for mgmnt domain. I will be writing separate blog on it,
Password management options allows you to create / edit passwords for all SDDC components at one place. You can also schedule password rotation for all components.
As discussed in the first blog of this series, here is the option to subscribe to licenses,
Like other products of VMware, you get an option to integrate AD,
Option to deploy vRealize Suite from SDDC,
Well, that’s all for this post. Keep following for upcoming blogs on VCF 5.X.
Login to Cloud Builder VM and start the deployment process.
Select “vCloud Foundation” here,
The other option “Dell EMC VxRail” to be used when your physical hardware vendor is Dell.
VxRail is hyper-converged appliance. It’s a single device which includes compute, storage, networking and virtualization resources. It comes with pre-configured vCenter and esxi servers. Then there is a manual process to convert this embedded vCenter into user manage vCenter, and that’s when we use this option.
Read all prereqs on this page and make sure to fulfill them before you proceed.
Scroll down to check remaining prereqs,
Click next here.
Earlier versions of VCF gave an option to download the “Deployment Parameter” excel sheet on this page.
You must download this sheet from the same place where you downloaded the vcf ova from.
Its time to start the actual deployment. We will resolve the issues as we move on. Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.
Upload the file and Next. CB validates everything that is required for the complete deployment in this step.
To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.
Let’s continue with the deployment…
“Error connecting to ESXi host. SSL Certificate common name doesn’t match ESXi FQDN”
Look at the “vcf-bringup.log” file.
This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates
You can see here, Even if the hostname on the top shows “vcf157.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.
SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.
esxcli system hostname set -H=vcf157 esxcli system hostname set -f= vcf157.virtualrove.local cd /etc/vmware/ssl /sbin/generate-certificates /etc/init.d/hostd restart && /etc/init.d/vpxa restart Reboot
You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.
Verify the hostname in the cert once server boots up.
Next, Hit retry on cloud builder, and we should be good.
Next, warning for vSAN Disk Availability Validate ESXi host has at least one valid boot disk.
Not sure about this one. Double checked and confirm that all disks are available on the esxi host. I will simply ignore this.
Next, warnings for NTP. Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds
For ESXi, Restart of ntpd.service resolved issue. For CB, I had to sync the time manually.
Steps to manually sync NTP… ntpq -p systemctl stop ntpd.service ntpdate 172.16.31.110 Wait for a min and again run this ntpdate 172.16.31.110 systemctl start ntpd.service systemctl restart ntpd.service ntpq -p
verify the offset again. It must be closer to 0. Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊
This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing.
Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.
Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.
Next, You will get this window once you resolve all errors. Click on “Deploy SDDC”.
Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.
So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊
Let the game begin…
Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.
Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.
Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.
It started the vCenter deployment on 1st VSAN enabled host.
You can also login to 1st esxi and check the progress of vCenter deployment.
vCenter installation finished. Moved to NSX deployment.
Failed at NSX deployment stage,
Failed to join NSX managers to form a management cluster. Failed to detach NSX managers from the NSX management cluster.
I logged into the all 3 NSX managers and found that one of the NSX manager were showing Management UI: DOWN on the console. Restarted the affected NSX manager and it was all good.
Retry on the CB did not show that error again. And finally, it finished all tasks.
Click Finish. And it launches another box.
That was fun. We have successfully deployed vCloud Foundation version 5.0
There are multiple tests that can be performed to check if the deployed environment is redundant at every level. Time to verify and do some post deployment checks. I will cover that in next post.
Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.
For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.
Hope you enjoyed reading the post. It’s time for you to get started and deploy VCF. Feel free to comment below if you face any issues.
In this post, we will perform step by step installation of vCloud Foundation 4.0. It has been couple of weeks since this version has released. I have been working on VCF & VVD since couple of years and deployed it multiple times, hence wanted to write a blog on it.
Before we start with VCF 4.0, Please check the network configuration in my VyOS Virtual Router blog here.
Introduction:
VMware Cloud Foundation is a private as well as public cloud solution. It is a unified platform which will give you entire SDDC stack. VCF 4.0 includes vSphere 7.0, VSAN 7.0, NSX-T 3.0, VRA 8.1 as well as SDDC manager to manage your virtual infrastructure domains. One more big change in VCF 4.0 is, Kubernetes Cluster deployment through SDDC manager after successful deployment of management domain.
Bills of material (Image copied from VMware site)
Check out VMware’s official site for all new features & release notes here…
vCloud Foundation deployment requires multiple networks to be in place before we start the deployment. We will discuss about the network requirements for successful deployment.
Network Requirements: Following management domain networks to be in place on physical switch (TOR). Jumbo frames (MTU 9000) are recommended on all VLANs or minimum of 1600 MTU. Check out the ports requirements on VMware site https://ports.vmware.com/home/VMware-Cloud-Foundation
Follow my previous blog for network configuration here.
Physical Hardware: Minimum 4 physical server with preinstalled VMware ESXi 7.0 hypervisor for VSAN cluster.
AD & DNS Requirements: Active Directory (Domain Controller) to be in place. In our case, DC is connected to 1631 VLAN on VyOS. Following DNS records to be in place before we start with the installation.
Pre-installed ESXi Configuration:
All ESXi must have ‘VM network’ and ‘Management network’ VLAN id 1631 configured. NTP server address should be in place on all ESXi. SSH & NTP service to be enabled and policy set to ‘Start & Stop with the host’ All additional disks to be present on an ESXi for VSAN configuration.
Let’s begin with the nested ESXi configuration for our lab.
Create 4 new VM’s on physical ESXi. These will be our nested ESXi where our VCF env will get install. All ESXi should have identical configuration. I have following configuration in my lab.
CPU: 16 CPU hot plug: Enabled Hardware Virtualization: Enabled
Once done, run ‘esxcli storage core device list’ command and verify if you see SSD instead of HDD.
This completes our ESXi configuration.
Cloud Builder:
Cloud Builder is an appliance provided by VMware to build VCF env on target ESXi’s. It is one time use VM and can be powered off after the successful deployment of VCF management domain. After deployment, we will use SDDC manager for managing additional VI domains. I will be deploying this appliance in VLAN 1631, so that it gets access to DC and all our ESXi servers. Download the CB appliance from VMware downloads.
Deployment is straight forward like any other ova deployment. Make sure to you choose right password while deploying the ova. The admin & root password must be a minimum of 8 characters and include at least one uppercase, one lowercase, one digit, and one special character. If this does not meet, then the deployment will fail which results in re-deploying ova.
Till now, we have completed configuration of Domain controller, VyoS router, nested ESXi & Cloud Builder ova deployment. Following VM’s have been created on my physical ESXi host.
Log into Cloud Builder using configured fqdn and click next on this screen.
Check if all prereqs are in place and click Next.
Download the ‘Deployment Parameter Workbook’ on this page.
Deployment Parameter Workbook:
It is an Excel sheet which needs to be filled accurately without breaking its format. Be careful while filling this workbook, as it provides all input parameters for our VCF deployment. Let’s have a look at the sheet.
Prerequisite Checklist: Cross check your environment as per prereqs.
Management Workloads: All license information needs to go in here.
Users and Groups: You need specify all passwords here. Check out the NSX-T passwords, as the validation fails if it does not match the password policy.
Hosts and Networks: Edit network information as per the environment and update ESXi information accordingly.
Deploy Parameters: Fill out all the information as per your environment. If you miss something, the cell turns red which causes failure in validation.
After you complete this sheet, it needs to be uploaded in cloud builder on this page.
Next is, Validation of the workbook and preinstalled ESXi.
Resolve any errors / warnings that shows up here.
Status should show ‘Success’ for all validation items. Click Next and click on Deploy SDDC.
All SDDC components gets installed on nested ESXi and you see this message.
SDDC Deployment Complete.
Check the SDDC Manager and vCenter.
It was defiantly not that easy for me first time. This was my 3rd deployment which got successful in 1st run. The last successful run took around 4 hours to complete. I have written this blog after resolving the errors that I got, so that you don’t waste time in troubleshooting. If you miss any steps in this post, you will surely end up in errors.
Here are some suggestions.
Keep checking vcf-bringup.log in cloud builder for any errors in deployment. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.
Another error ‘The manifest is present but user flag causing to skip it.’ caused my deployment to fail.
To resolve this, I changed the deployment model of NSX-T to ‘Small’ from ‘Medium’. Looked like it was compute resource issue.
Also, keep checking NTP sync on the cloud builder. Mine did not sync with NTP for some reason and I had to manually sync it.
Steps to manually sync NTP… ntpq -p systemctl stop ntpd.service ntpdate 172.16.31.110 Wait for a min and again run this ntpdate 172.16.31.110 systemctl start ntpd.service systemctl restart ntpd.service ntpq -p
Verify the offset again. It must be closer to 0.
NSX-T Deployment error.
The NSX-T OVF wasn’t getting installed. I could see generic error in vCenter. Reboot of entire environment fixed the issue for me.
Also, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.
For example, my NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.
That’s it for this post. I will come up with some more posts on VCF 4.0. Next is to deploy additional workload domain and application networks for it.