Virtual Machine Disk Latency...

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
Hi all

one of my VM's has started seeing very intermittent disk latency of up to 25 seconds.

All other VM's appear OK, it happens maybe once or twice a day, the VM is running W2003sbs and is our PDC and SQL data server.

There are other VM's in the same datastore and they appear ok, so it points me towards the particular VM.

I have tried running EsxTop but due to the intermittent nature, I'm not learning anything there. The data I have is coming from the vSphere client performance chart which I have maximised on my secondary desktop so i can monitor it while working.

I have had two spikes of over 12 seconds today so far.

Not sure what to check for this one but it may explain why some user are saying the system is unresponsive and maybe also why i am seeing some network failures while running SQL queries that are large?

Dave
 
Last edited:

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
Yes there are 5 vm's in total but this is the only one on this datastore, it also has a virtual disk configured on the other datastore.

I have not updated any drivers or firmware since installation a year ago, I'm pretty certain this behaviour is a recent event.

None of the ther vm's appear to be affected, they are running XP (x3) and w2008r2
 
Upvote 0

yorkukhosting

Free Member
Jul 24, 2011
31
9
Rugby, UK
Have you tried running a perfmon trace on the Windows 2003 SBS VM? Does this show anything around the time you see the poor performance, what doe the event logs say as well? I assume this VM has the VMware tools installed?

Are there any jobs configured on the SQL server around the time the slowdown occurs? For example, a backup or index rebuild could impact disk performance.

Out of interest what type of RAID card are using, is it a H310 or H710?
 
Upvote 0

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
Have you tried running a perfmon trace on the Windows 2003 SBS VM? Does this show anything around the time you see the poor performance, what doe the event logs say as well? I assume this VM has the VMware tools installed?

Are there any jobs configured on the SQL server around the time the slowdown occurs? For example, a backup or index rebuild could impact disk performance.

Out of interest what type of RAID card are using, is it a H310 or H710?

Its got a Perc H200 raid setup.

I will have a look at pefrmon on the VM today, what would be the best counters to monitor???

VMtools is running on all VM's.

Backups are scheduled for the evening so daytime is fully used for file serving.

Dave
 
Last edited:
Upvote 0

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
Not getting very far as i really don't know what i'm supposed to be looking for :redface: But this server VM emails me a report every day and lately it's been showing some wild results, a cut/paste is shown here...

Performance Summary

Performance Counters Today Last Month Rate of Growth
Memory in use 5,216 MB 4,954 MB 5 %
Free disk space (C 21,428 MB 21,367 MB 0 %
Free disk space (D 208,583 MB 208,636 MB 0 %
Free disk space (F 30,264 MB 31,875 MB -5 %
Free disk space (G 138,542 MB 138,757 MB 0 %
Free disk space (H 33,360 MB 33,227 MB 0 %
Busy disk time (0 C 1 % 1 % 35 %
Busy disk time (1 D 3 % 2 % 61 %
Busy disk time (2 G 0 % 0 % 2,642 %
Busy disk time (3 F 0 % 0 % 109 %
Busy disk time (4 H 0 % 0 % 107 %
CPU Use (0) 1 % 1 % 6 %
CPU Use (1) 2 % 3 % -42 %
CPU Use (2) 1 % 1 % -6 %
CPU Use (3) 3 % 2 % 86 %


Top 5 Processes by Memory Usage

Process Name - ID Memory Usage
sqlservr - 11392 1,777 MB
sqlservr - 1904 346 MB
store - 4188 230 MB
vmtoolsd - 576 170 MB
services - 424 88 MB


Top 5 Processes by CPU Usage

Process Name - ID CPU Time
sqlservr - 11392 2.1 %
NTRtScan - 9644 1.4 %
System - 4 0.7 %
svchost - 936 0.6 %
OfcService - 2568 0.4 %

As you can see, Disk G: is showing zero change but massive busy time.

Anyone know what this means?
 
Upvote 0
The large figure for busy time it in the "Rate of Growth" field and not the busy time itself. It looks to be as if it is showing 0% busy time. This is a report from the Guest VM itself. Do you have any way to monitor ESXi itself. Where is the figure for latency time coming from? Is it ESXi or the Guest O/S?
 
Upvote 0

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
The large figure for busy time it in the "Rate of Growth" field and not the busy time itself. It looks to be as if it is showing 0% busy time. This is a report from the Guest VM itself. Do you have any way to monitor ESXi itself. Where is the figure for latency time coming from? Is it ESXi or the Guest O/S?


The figure that triggered my attention was from the vSphere client performance chart for that particular VM, I am watching the "Highest Latency" counter for the whole VM.

I have successfully launched EsxTop from an SSH login but the figures mean very little to me and due to the intermittent nature of the fault and the rapid refresh of esxtop data, the chances of witnessing it are slim.
 
Upvote 0

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
Just noticed a small pattern maybe.

Had two spikes within the hour and both are exactly the same duration - 127,131,031ms.

Now that's just over 2 minutes BUT the graph period was back to zero within 20 seconds (default refresh period), so something does not add up here surely?

It just shows as a spike on the graph.
 
Upvote 0

yorkukhosting

Free Member
Jul 24, 2011
31
9
Rugby, UK
Hi Dave,

Based upon your first post it seems that users are actually experiencing performance issues but the last post documents a bug in the reporting rather than a performance issue, in this case it still implies you have a problem.

This may be of use;

http://communities.vmware.com/thread/289505?start=0&tstart=0

For RAID you really need something like the H710 with battery backed cache for optimal performance.

Regards
 
Upvote 0

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
This came from the w2003sbs system report last night...

Busy disk time (0 C) 5 % 0 % 1,290 %
Busy disk time (1 D) 7 % 1 % 803 %
Busy disk time (2 G) 4 % 0 % 183,086 %
Busy disk time (3 F) 4 % 0 % 363,947 %
Busy disk time (4 H) 5 % 0 % 2,621 %
CPU Use (0) 5 % 1 % 464 %
CPU Use (1) 6 % 1 % 299 %
CPU Use (2) 5 % 0 % 1,155 %
CPU Use (3) 6 % 2 % 298 %

Seems the current (mid column) details are showing big changes for some unknown reason.

Changing the raid controller is not an option really and i was fully under the view that this controller was battery backed, the HDD's have the large cache option as well.

The report above was generally showing zeroes up until a few months ago, nothing has changed on the server VM, certainly no software has been installed as it is locked down pretty well.
 
Upvote 0

Davek0974

Free Member
Mar 7, 2008
2,633
312
Hertfordshire
I have seen something similar recently which turned out to be a failing RAID card. Once replaced it was fine again (this required a rebuild - thank goodness there was a second host to migrate VMs to)


Only one server here, so that does not sound good at all, why does the new card not pick up the existing disk data as it was???
 
Upvote 0

Latest Articles

Join UK Business Forums for free business advice