I know it’s been a while since my last post, so I’m very happy to be again on the road.
Last week I had some issues in a VMware Cloud Director enviroment that ended up opening a support request with VMware. During the case development, the support team requested me to create a log bundle, but I wasn’t even able to generate it, so that’s where this story begins. Please note that the modifications I’ve done might not be supported by VMware.
For those who are not familiar on how to generate a VCD log bundle, here are the steps https://kb.vmware.com/s/article/1026312
The error that I was having was the following
The first step to troubleshoot this situation was to identify which cells were the ones with a FAIL status. The CELLID can be obtained with the following command
more /opt/vmware/vcloud-director/etc/global.properties | grep cell.uuid
To troubleshoot the bundle generation process there are two log files located under /opt/vmware/vcloud-director/logs
This first log file shows if the cell detects the marker file that indicates the cell to start generating a lod bundle. On a regular situation it should look like the following
From that side everything was looking perfect, so I went into the other file to understand what was happening.
Analyzing the logs I found two different errors. The first one, that a log collection process was still running. The way I managed to fix this, was to shutdown the cell services and start them again.
The second error was just a timeout. I wondered where that timeout could be configured to extend it.
After I realized about that, I started digging deeper into the support bundle generation script.
I’m not going to cover the complete script because it’s not the central topic of the post. In that file I found the some variables that pointed me into the multi-cell-log collection script, so it was time to check that file also.
I opened the file with the following command and founded that there was a timeout variable that I could modify.
So I decided to modify that value and try again. It’s not a value that will wait until it expires, it’s just a maximum timeout. If the cell bundle generation ends earlier, the script will continue.
After modifying that timeout everything was working again so I returned to the vmware-vcd-log-publisher.log to analyze how much time was taking, just out of curiosity. It was taking between 8-9 minutes, a little more than the default.
I ended up doing a cleanup of old logs, and the process was smooth again in less than 7 minutes.
There is another reason why this process might fail and its detailed on this kb https://kb.vmware.com/s/article/71349
If your are interested in a specific topic, let me know in the comments section below and I’ll be happy to write a new post about it.
Be sociable, share it!