VMware vCloud Foundation 4.2.1 Step by Step Phase3 – Deployment

Welcome back. We have covered the background work as well as deployment parameter sheet in earlier posts. If you missed it, you can find it here…

VMware vCloud Foundation 4.2.1 Step by Step Phase1 – Preparation
VMware vCloud Foundation 4.2.1 Step by Step Phase2 – Cloud Builder & Deployment Parameters

Its time to start the actual deployment. We will resolve the issues as we move on.
Let’s upload the “Deployment Parameter” sheet to Cloud Builder and begin the deployment.

Upload the file and Next.  I got an error here.

Bad Request: Invalid input
DNS Domain must match

Figured out to be an additional space in DNS Zone Name here.

This was corrected. Updated the sheet and NEXT.

All good. Validation process started.

To understand & troubleshoot the issues / failures that we might face while deploying VCF, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. This file will give you live update of the deployment and any errors which caused the deployment to fail. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment. PFB.

Let’s continue with the deployment…

Next Error.

“Error connecting to ESXi host esxi01. SSL Certificate common name doesn’t match ESXi FQDN”

Look at the “vcf-bringup.log” file.

This is because the certificate for an esxi gets generated after it was installed with default name and not when we rename the hostname. You can check the hostname in certificates. Login to an ESXi > Manage> Security & Users> Certificates

You can see here, Even if the hostname on the top shows “esxi01.virtualrove.local, the CN name in certificate is still the “localhost.localdomain”. We must change this to continue.

SSH to the esxi server and run following command to change the hostname, fqdn & to generate new certs.

esxcli system hostname set -H=esxi03
esxcli system hostname set -f=esxi03.virtualrove.local
cd /etc/vmware/ssl
/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart
Reboot

You need to do this for all hosts by replacing the hostname in the command for each esxi respectively.

Verify the hostname in the cert once server boots up.

Next, Hit retry on cloud builder, and we should be good.

I am not sure why this showed up. I was able to reach to these IP’s from “Cloud Builder”.

 

Anyways, this was warning, and it can be ignored.

Next one was with host tep and edge tep.

VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi02.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) on host ‘esxi02.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi02.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.3’ (‘NSXT_EDGE_TEP’) from host ‘esxi02.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed

VM Kernel ping from IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) from host ‘esxi01.virtualrove.local’ to IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) on host ‘esxi03.virtualrove.local’ failed
VM Kernel ping from IP ” (‘NSXT_HOST_OVERLAY’) from host ‘esxi01.virtualrove.local’ to IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) on host ‘esxi03.virtualrove.local’ failed
VM Kernel ping from IP ‘169.254.50.254’ (‘NSXT_HOST_OVERLAY’) from host ‘esxi03.virtualrove.local’ to IP ‘172.27.13.2’ (‘NSXT_EDGE_TEP’) on host ‘esxi01.virtualrove.local’ failed
VM Kernel ping from IP ‘172.27.13.4’ (‘NSXT_EDGE_TEP’) from host ‘esxi03.virtualrove.local’ to IP ” (‘NSXT_HOST_OVERLAY’) on host ‘esxi01.virtualrove.local’ failed

First of all, I failed to understand APIPA 169.254.X.X. We had mentioned VLAN 1634 for Host TEP. It should have picked an ip address 172.16.34.X. This VLAN was already in place on TOR and I was able to ping the GW of it from CB. I took a chance here and ignored it since it was a warning.

Next, got warnings for NTP.

Host cb.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local
NTP Server 172.16.31.110 and host cb.virtaulrove.local time drift is not below 30 seconds
Host esxi01.virtaulrove.local is not currently synchronising time with NTP Server dc.virtaulrove.local

For ESXi, Restart of ntpd.service resolved issue.
For CB, I had to sync the time manually.

Steps to manually sync NTP…

ntpq -p
systemctl stop ntpd.service
ntpdate 172.16.31.110
Wait for a min and again run this
ntpdate 172.16.31.110
systemctl start ntpd.service
systemctl restart ntpd.service
ntpq -p

verify the offset again. It must be closer to 0.

Next, I locked out root password of Cloud Builder VM due to multiple logon failure. 😊

This is usual since the passwords are complex and sometimes you have to type it manually on the console, and top of that, you don’t even see (in linux) what you are typing.
Anyways, it’s a standard process to reset the root account password for photon OS. Same applies to vCenter. Check the small writeup on it on the below link.

Next, Back to CB, click on “Acknowledge” if you want to ignore the warning.

Next, You will get this window once you resolve all errors.

Click on “Deploy SDDC”.

Important Note: Once you click on “Deploy SDDC”, the bring-up process first builds VSAN on 1st ESXi server from the list and then it deploys vCenter on 1st ESXi host. If bring-up fails for any reason and if you figured out that the one of the parameter in excel sheet is incorrect, then it is tedious job to change the parameter which is already uploaded to CB. You have to use jsongenerator commands to replace the existing excel sheet in the CB. I have not come across such a scenario yet, however there is a good writeup on it from good friend of mine.

Retry Failed Bringup with Modified Input Spec in VCF

So, make sure to fill all correct details in “Deployment Parameter” sheet. 😊

Let the game begin…

Again, keep an eye on vcf-bringup.log file. The location of the file is ‘/opt/vmware/bringup/logs/’ in cloud builder. Use ‘tail -f vcf-bringup.log’ to get the latest update on deployment.

Installation starts. Good luck. Be prepared to see unexpected errors. Don’t loose hopes as there might several errors before the deployment completes. Mine took 1 week to deploy when I did it first time.

Bring-up process started. All looks good here. Status as “Success”. Let’s keep watching.

All looks good here. Till this point I had vCenter in place and it was deploying first NSX-T ova.

Looks great.

Glance at the NSX-T env.

Note that the TEP ip’s for host are from the vlan 1634. However, CB validation stage was picking up apipa.

NSX-T was fine. It moved to SDDC further.

Woo, Bring-up moved to post deployment task.

Moved to AVN (Application Virtual Networking). I am expecting some errors here.

Failed.

“A problem has occurred on the server. Please retry or contact the service provider and provide the reference token. Unable to create logical tier-1 gateway (0)”

This was easy one. vcf-bringup.log showed that it was due to missing DNS record for edge vm. Created DNS record and retry.

Next one,

“Failed to validate BGP Neighbor Perring Status for edge node 172.16.31.125”

Let’s look at the log file.

Time to check NSX-T env.

Tier-0 gateway Interfaces looks good as per out deployment parameters.

However, BGP Neighbors are down.

This was expected since we haven’t done the BGP configuration on TOR (VyOS) yet. Let’s get in to VyOS and run some commands.

set protocols bgp 65001 parameters router-id 172.27.11.253
This command specifies the router-ID. If router ID is not specified it will use the highest interface IP address.

set protocols bgp 65001 neighbor 172.27.11.2 update-source eth4
Specify the IPv4 source address to use for the BGP session to this neighbor, may be specified as either an IPv4 address directly or as an interface name.

set protocols bgp 65001 neighbor 172.27.11.2 remote-as ‘65003’
This command creates a new neighbor whose remote-as is <nasn>. The neighbor address can be an IPv4 address or an IPv6 address or an interface to use for the connection. The command is applicable for peer and peer group.

set protocols bgp 65001 neighbor 172.27.11.3 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.11.2 password VMw@re1!
set protocols bgp 65001 neighbor 172.27.11.3 password VMw@re1!

Commit
Save

TOR configuration done for 2711 vlan. Let’s refresh and check the bgp status in nsx-t.

Looks good.

Same configuration to be performed for 2nd VLAN. I am using same VyOS for both the vlans since it’s a lab env. Usually, You will have 2 TOR’s and each BGP peer vlan configured respectively for redundancy purpose.

set protocols bgp 65001 parameters router-id 172.27.12.253
set protocols bgp 65001 neighbor 172.27.12.2 update-source eth5
set protocols bgp 65001 neighbor 172.27.12.2 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.12.3 remote-as ‘65003’
set protocols bgp 65001 neighbor 172.27.12.2 password VMw@re1!
set protocols bgp 65001 neighbor 172.27.12.3 password VMw@re1!

Both BGP Neighbors are successful.

Hit Retry on CB and it should pass that phase.

Next Error on Cloud Builder: ‘Failed to validate BGP route distribution.’

Log File.

At this stage, routing has been configured in your NSX-T environment, both edges have been deployed and BGP peering has been done. If you check bgp peer information on edge as well as VyOS router, it will show ‘established’ and even routes from NSX-T environment appears on your VyOS router. Which means, route redistribution from NSX to VyOS works fine and this error means that there are no routes advertised from VyOS (TOR) to NSX environment. Let’s get into VyOS and run some commands.

set protocols bgp 65001 address-family ipv4-unicast network 172.16.31.0/24
set protocols bgp 65001 address-family ipv4-unicast network 172.16.32.0/24

Retry on CB and you should be good.

Everything went smoothly after this. SDDC was deployed successfully.

That was fun. We have successfully deployed vCloud Foundation version 4.2.1 including AVN (Application Virtual Networking).

Time to verify and check the components that have been installed.

SDDC Manager.

Segments in NSX-T which was specified in deployment parameters sheet.

Verify on the TOR (VyOS) if you see these segments as BGP published networks.

Added a test segment called “virtaulrove_overlay_172.16.50.0” in nsx-t to check if the newly created network gets published to TOR.

All looks good. I see the new segment subnet populated on TOR.

Let’s do some testing. As you see above, new segment subnets are being learned from 172.27.11.2 this interface is configured on edge01 VM. Check it here.

We will take down edge01 VM to see if route learning changes to edge02.

Get into nodes on nsx-t and “Enter NSX Maintenance mode” for edge 01 VM.

Edge01, Tunnels & Status down.

Notice that the gateway address has been failed over to 172.27.11.3.

All Fine, All Good. 😊

There are multiple tests that can be performed to check if the deployed environment is redundant at every level.

Additionally, use this command ‘systemctl restart vcf-bringup’ to pause the deployment when required.

For example, in my case NSX-T manger was taking time to get deployed, and due to an interval on cloud builder, it used to cancel the deployment assuming some failure. So, I paused the deployment after nsx-t ova job got triggered from CB and hit ‘Retry’ after nsx got deployed successfully in vCenter. It picked it up from that point and moved on.

You should have enjoyed reading the post. It’s time for you to get started and deploy VCF. See you in future posts. Feel free to comment below if you face any issues when you deploy the VCF environment.

Are you looking out for a lab to practice VMware products..? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Leave your email address in the box below to receive notification on my new blogs.

NSX-T 3.0 Series: Part10-Testing NSX-T Environment

Hello Friends, We have completed all 9 parts and by now you should have your entire NSX-T 3.0 env up and running. This post will specifically focus on testing the env that we have deployed in this series.

NSX-T 3.0 Series: Part1-NSX-T Manager Installation
NSX-T 3.0 Series: Part2-Add additional NSX-T Manger & Configure VIP
NSX-T 3.0 Series: Part3-Add a Compute Manager (vCenter Server)
NSX-T 3.0 Series: Part4-Create Transport Zones & Uplink Profiles
NSX-T 3.0 Series: Part5-Configure NSX on Host Transport Nodes
NSX-T 3.0 Series: Part6-Depoy Edge Transport Nodes & Create Edge Clusters
NSX-T 3.0 Series: Part7-Add a Tier-0 gateway and configure BGP routing
NSX-T 3.0 Series: Part8-Add a Tier-1 gateway
NSX-T 3.0 Series: Part9-Create Segments & attach to T1 gateway
NSX-T 3.0 Series: Part10-Testing NSX-T Environment

This is how our logical topology looks like after the deployment.

All topologies in the NSX-T env can be found on NSX Manager UI.

Log into NSX Manager VIP >Networking >Networking Topology

You can filter to check specific object. Like I have filtered it for HR segment.
Export it to have a closer look.

Let’s verify north-south routing in the environment. We need to verify if the HR segment network shows as BGP learned route from 172.27.11.10 & 172.27.12.10 on respective TOR (VyOS) switches.

VyOS1

‘10.10.70.0’ network learned from ‘172.27.11.10’ and this is our Edge uplink1.

VyOS2

‘10.10.70.0’ network learned from ‘172.27.12.10’ and this is our Edge uplink2.

All good. We see the network on our TOR, which means our routing is working perfectly fine. Now, any network that gets added to NSX-T env will show up on TOR and should be reachable from TOR. Let’s check the connectivity from TOR.

Voila, we are able to ping the gateway of HR segment from both TOR. End to End (North-South) routing working as expected.

IF you don’t see newly created HR segment network on the TOR, then you have to check if the route is reaching till your Tier-0 router.

Log into edge03.dtaglab.local via putty.

Enable SSH from the console if you are not able to connect.

‘get logical-router’

We need to connect to Service Router of Tier-0 to check further details. Note that the VRF ID for Tier-0 Service Router is ‘1’

‘vrf 1’

‘get route’

We see ’10.10.70.0/24’ network as t1c (Tier-1 Connected). That means, route is reaching till Edge. If its not, you know what to troubleshoot.

Next, if route is on the Edge and not on the TOR, then you need to check BGP neighborship.

‘get bgp neighbor’

I see BGP state = Established for both BGP neighbor. (172.27.11.1 & 172.27.12.1). If not, then you need to recheck your BGP neighbor settings in NSX manager. Use ‘’traceroute’ command from vrf’s and edge to trace the packet.

That’s it for this series. I hope you enjoyed reading blogs from this series.

Happy Learning. 😊

Are you looking out for a lab to practice VMware products..? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Subscribe here to receive emails for new posts on this website.

NSX-T 3.0 Series: Part7-Add a Tier-0 gateway and configure BGP routing

We have completed 6 parts of this series. Check my earlier posts to move to Tier-0 & Tier-1 gateway.

NSX-T 3.0 Series: Part1-NSX-T Manager Installation
NSX-T 3.0 Series: Part2-Add additional NSX-T Manger & Configure VIP
NSX-T 3.0 Series: Part3-Add a Compute Manager (vCenter Server)
NSX-T 3.0 Series: Part4-Create Transport Zones & Uplink Profiles
NSX-T 3.0 Series: Part5-Configure NSX on Host Transport Nodes
NSX-T 3.0 Series: Part6-Depoy Edge Transport Nodes & Create Edge Clusters
NSX-T 3.0 Series: Part7-Add a Tier-0 gateway and configure BGP routing
NSX-T 3.0 Series: Part8-Add a Tier-1 gateway
NSX-T 3.0 Series: Part9-Create Segments & attach to T1 gateway
NSX-T 3.0 Series: Part10-Testing NSX-T Environment

Tier-0 Gateway:

This Gateway is used to process traffic between logical segments and physical network (TOR) by using routing protocol or static route. Here is the logical topology of Tier-0 & Tier-1 router.

Tier-0 & Tier-1 are logical routers. And each logical router has Service Router (SR) & Distributed Router (DR). Service Router is required for the services which can not be distributed like NAT, BGP, LB and Firewall. It’s a service on the Edge Node. Whereas, DR runs as a kernel module in all hypervisors also known as transport nodes and provides east-west routing.

With that, let’s get started creating Tier-0 router.

While creating Tier-0 gateway, we will configure uplink interfaces to TOR to form BGP neighborship. To connect your Uplink to TOR we need VLAN based logical switches in place. You must connect a Tier-0 router to VLAN based logical switch. VLAN ID for logical switch & TOR port for EDGE uplink should match. Here is the topology.

All components except TOR will be in same VLAN Transport Zone.

Log into NSX-T Manager VIP and navigate to Networking >Segments >Segments >ADD SEGMENT

Segment Name: Give an appropriate name.
Transport Zone: ‘Horizon-Edge-VLAN-TZ’

VLAN ID: 2711

Follow the same process to create one more segment for VLAN ID 2712.

We now move to creating Tier-0 Gateway.

Log into NSX-T Manager VIP and navigate to Networking >Tier-0 Gateways >ADD GATEWAY >Tier-0

Tier-0 Gateway Name: Horizon
HA Mode: Active-Active (default mode).

In Active-Active mode, traffic traffic is load balanced aross all members whereas ‘Active-Standby’ elects active member for traffic flow. NAT, Load Balance, Firewall & VPN is only supported in ‘Active-Standby’ Mode.

Edge Cluster: ‘HorizonEdgeClust’

Scroll down to configure additional settigns.
Click on ‘SET’ under ‘Interfaces’

Add Interface

Name: Give an appropriate name.
Type: External
IP Address: 172.27.11.10/24
Conneted To: Select the Segment for VLAN ID 2711
Edge Node: Edge03 (Since each edge will have different uplink)
MTU: 9000

Rest paramenter to be default. Click on Save.

Follow the same process to add an 2nd uplink interface (172.27.12.10/24) for VLAN 2712.

Status for both the interfaces will show as ‘Uninitialized’ for few seconds. Click the Refresh and it should show ‘SUCCESS’

These two IP addresses will be configured on out TOR (VyOS) as a BGP neighbor.

Move to BGP section of Tier-0 Gateway to configure it further.

Local AS: 65004
InterSR iBGP: Enable (An iBGP peering gets established between both SR with Subnet (169.254.0.0/25) managed by NSX.
ECMP: Enabled
Graceful Restart: Graceful Restart & Helper.
By default, the Graceful Restart mode is set to Helper Only. Helper mode is useful for eliminating and/or reducing the disruption of traffic associated with routes learned from a neighbor capable of Graceful Restart. The neighbor must be able to preserve its forwarding table while it undergoes a restart.

BGP Neighbor: Click on Set.
IP Address: 172.27.11.1 (We have configured this as an interface IP on TOR (VyOS)
Remote AS: 65001 (Configured on TOR)
Source IP: 172.27.11.10 (Uplink IP)

Follow the same process for IP address ‘172.27.12.1’

Both Neighbors will show status as ‘Down’ until you configure BGP on your TOR.
Ran following commands on my TOR to form a neighborship.

VyOS1

set protocols bgp 65001 neighbor 172.27.11.10 update-source eth4
set protocols bgp 65001 neighbor 172.27.11.10 remote-as ‘65004’

VyOS2

set protocols bgp 65001 neighbor 172.27.12.10 update-source eth0
set protocols bgp 65001 neighbor 172.27.12.10 remote-as ‘65004’

Click Refresh and it should show ‘Success’

We have successfully deployed a Tier-0 Gateway and BGP has been established with TOR.

That’s it for this post. I hope you enjoyed reading. Comments are Welcome. 😊

Are you looking out for a lab to practice VMware products..? If yes, then click here to know more about our Lab-as-a-Service (LaaS).

Subscribe here to receive emails for my new posts on this website.