Wednesday, March 7, 2012

Help with Performance question regarding IO Size

Hi,
I was wondering if anyone had any idea what is going on with the
following problem or if there is anything that could be checked on to
resolve it.
Basically been an ongoing performance problem with a sql 2005 64bit
itanium box which has been running slower than equivalent sql 2000
box.
The setup of this particular problem is that the server is attached to
a san with one large striped set of disks using a 2gb card. The table
i'm using is in a filegroup that is split across 4 files. To take out
of the equation any O/S fragmentation i've de-fragmented the 4 files.
I've also rebuilt the indexes to remove any fragmentation.
If i run a dbcc showcontig on the file i noticed that the latency of
transfer looks fine (under 20ms) for about 45-60secs and then shoots
up in one large jump to over 1 sec which kills performance. on further
investigation the IO size on the operation starts off at an 8k block
and then at the point latency jumps to over 1sec the block size goes
to 64k.
Why would this be? Wouldn't the IO size be constant during the
operation? or could it be the read size somehow be different between
one of the files in the filegroup. Not sure how this works or how you
would check it out. Anyone have any ideas?
ps. i've run SQLIO on the SAN and 8k sequential reads run in the
milliseconds and 64k IO's are running on average at about 0.025 to
0.050 with the odd spike upto 0.3 but nothing like when the SQL Server
seems to run at 64k IO sizes.
Anyone have any ideas?
Thanks
Ian.I don't think the fact that SQL Server was issuing 64K requests was
necessarily the root of the problem. Many SQL Server operations such as
read-ahead and checkpoints may issue larger than 8K I/O requests (or even
larger than 64K requests).
If I have to take a wild guess, I'd investigate whether your I/O performance
was killed by checkpoints. First of all, instead of looking at the I/O
transfer counters, I'd look at the more specific I/O read and write counters
.
And, I'd check if the performance degradation coincides with checkpoints.
Include the counter SQLServer:Buffer Manager\Checkpoint pages/sec in your
perfmon tracking, and see if it's not zero when you experience the said
performance degradation.
Linchi
"ianwr" wrote:

> Hi,
> I was wondering if anyone had any idea what is going on with the
> following problem or if there is anything that could be checked on to
> resolve it.
> Basically been an ongoing performance problem with a sql 2005 64bit
> itanium box which has been running slower than equivalent sql 2000
> box.
> The setup of this particular problem is that the server is attached to
> a san with one large striped set of disks using a 2gb card. The table
> i'm using is in a filegroup that is split across 4 files. To take out
> of the equation any O/S fragmentation i've de-fragmented the 4 files.
> I've also rebuilt the indexes to remove any fragmentation.
> If i run a dbcc showcontig on the file i noticed that the latency of
> transfer looks fine (under 20ms) for about 45-60secs and then shoots
> up in one large jump to over 1 sec which kills performance. on further
> investigation the IO size on the operation starts off at an 8k block
> and then at the point latency jumps to over 1sec the block size goes
> to 64k.
> Why would this be? Wouldn't the IO size be constant during the
> operation? or could it be the read size somehow be different between
> one of the files in the filegroup. Not sure how this works or how you
> would check it out. Anyone have any ideas?
> ps. i've run SQLIO on the SAN and 8k sequential reads run in the
> milliseconds and 64k IO's are running on average at about 0.025 to
> 0.050 with the odd spike upto 0.3 but nothing like when the SQL Server
> seems to run at 64k IO sizes.
> Anyone have any ideas?
> Thanks
> Ian.
>|||Hi Linchi,
Thanks for the suggestion, i checked out what you said but
unfortunately there were no checkpoint operations during the
showcontig operation. Latency was 8ms with the 8k blocks until about 1
minute through the routine when i/o size went to 64k and latency
jumped to 1.2 secs.
Pulling my hair out now with this one. Performance can be so
unpredictable. I recently re-indexed 2 tables on the box with no-one
else on the system the 1st one had 120million rows and took 20 mins to
re-index, the 2nd with an identical structure but 135 million rows
(which i was hoping would take about 30-40 mins took 2.5 hours.
Looking at the counters latency was terrible during this period.
Just not sure what i can check now, I can get 20ms performance on the
sqlio tests at 64k but this dbcc operation causes latency to jump to
1.2 secs.
Any ideas?
Thanks
Ian.|||I don't know the internals of how DBCC SHOWCONTIG issues its I/O requests. S
o
I tried it on one of my servers with a drive presented from a SAN. I did
notice what you described, i.e. DBCC SHOWCONTIG initially issues 8K reads,
and then bumped the I/O request size up to ~64K. In my case, the Avg Disk
Sec/Read for the 64K reads was ~160ms, far better than what you reported.
In my little test, I noticed that SQL Server was quite aggressive in
maintaining a long I/O queue length. The Current Disk Queue length for the
drive was >32 and reached into >100 for the duration of DBCC SHOWCONTIG.
In your case, I'd re-run sqlio.exe with the -o parameter set to a large
number (say 64) to see how the drive behaves with 64K reads (both sequential
and random).
Linchi
"ianwr" wrote:

> Hi Linchi,
> Thanks for the suggestion, i checked out what you said but
> unfortunately there were no checkpoint operations during the
> showcontig operation. Latency was 8ms with the 8k blocks until about 1
> minute through the routine when i/o size went to 64k and latency
> jumped to 1.2 secs.
> Pulling my hair out now with this one. Performance can be so
> unpredictable. I recently re-indexed 2 tables on the box with no-one
> else on the system the 1st one had 120million rows and took 20 mins to
> re-index, the 2nd with an identical structure but 135 million rows
> (which i was hoping would take about 30-40 mins took 2.5 hours.
> Looking at the counters latency was terrible during this period.
> Just not sure what i can check now, I can get 20ms performance on the
> sqlio tests at 64k but this dbcc operation causes latency to jump to
> 1.2 secs.
> Any ideas?
> Thanks
> Ian.
>|||By the way, when you ran your sqlio tests with 64K reads, what was the size
of the test file you used? The file size can make a huge difference in what
results you may get. See
http://www.sqlteam.com/article/benc...ce-size-matters
for more info.
Linchi
"Linchi Shea" wrote:
[vbcol=seagreen]
> I don't know the internals of how DBCC SHOWCONTIG issues its I/O requests.
So
> I tried it on one of my servers with a drive presented from a SAN. I did
> notice what you described, i.e. DBCC SHOWCONTIG initially issues 8K reads,
> and then bumped the I/O request size up to ~64K. In my case, the Avg Disk
> Sec/Read for the 64K reads was ~160ms, far better than what you reported.
> In my little test, I noticed that SQL Server was quite aggressive in
> maintaining a long I/O queue length. The Current Disk Queue length for the
> drive was >32 and reached into >100 for the duration of DBCC SHOWCONTIG.
> In your case, I'd re-run sqlio.exe with the -o parameter set to a large
> number (say 64) to see how the drive behaves with 64K reads (both sequenti
al
> and random).
> Linchi
> "ianwr" wrote:
>|||What is the RAID set stripe size, NFTS allocation unit size for the formatt
of the partition and also did you sector align the partition using DISKPART?
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net
"ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
news:b6a8aa4a-381d-401d-8b24-f1da081564a1@.d4g2000prg.googlegroups.com...
> Hi Linchi,
> Thanks for the suggestion, i checked out what you said but
> unfortunately there were no checkpoint operations during the
> showcontig operation. Latency was 8ms with the 8k blocks until about 1
> minute through the routine when i/o size went to 64k and latency
> jumped to 1.2 secs.
> Pulling my hair out now with this one. Performance can be so
> unpredictable. I recently re-indexed 2 tables on the box with no-one
> else on the system the 1st one had 120million rows and took 20 mins to
> re-index, the 2nd with an identical structure but 135 million rows
> (which i was hoping would take about 30-40 mins took 2.5 hours.
> Looking at the counters latency was terrible during this period.
> Just not sure what i can check now, I can get 20ms performance on the
> sqlio tests at 64k but this dbcc operation causes latency to jump to
> 1.2 secs.
> Any ideas?
> Thanks
> Ian.|||Guys,
Thanks for your responses. I'm going to run the sqlio test again this
morning as i think i ran them initially with the default 100mb file
when using the 64k size.
I don't think the partition was sector aligned, only because i
specifically read the manufacturers documentation on this and they had
said that there was no need to with the current firmware used.
I will re-run the tests and find out the remainder of the information
you asked for this morning.
Thanks for all your help guys. Much appreciated.
Ian.|||Just ran the SQLIO tests and got the following for 64K reads with an
outstanding number of 64 :-
Sequential Read file size 256mb I/O Sec 2611 MB/Sec 163
min lat 4ms avg lat 48 max lat 124
Random Read file size 256mb I/O Sec 2743 MB/Sec 171
min lat 4ms avg lat 46 max lat 83
Sequential Read file size 2gb I/O Sec 2260 MB/Sec
141 min lat 4ms avg lat 56 max lat 427
Random Read file size 2gb I/O Sec 405 MB/Sec
25 min lat 4 avg lat 313 max lat 776
Sequential Read file size 8gb I/O Sec 974 MB/Sec
60 min lat 4ms avg lat 130 max lat 735
Random Read file Size 8gb I/O Sec 347 MB/Sec
21 min lat 21 avg lat 367 max lat 807
The san and server are used for a DW application and i would say that
the average file size for the databases is about 64gb but dont know if
this is relevant.
i've asked the SAN/Windows guy to get me the other parameters about
the stripe size, allocation unit size etc and will get back to you
with these. Just for future reference is there an easy way to pull off
these figures as i'm no windows guru ;-)|||1) Your use of 256MB for a test file against a SAN is not appropriate. Many
SANs (heck, even HBAs) have caches larger than that. Even 2GB could be less
than a SAN cache, although from the numbers it appears that you are at least
starting to hit physical I/O. You really should test with a file (or files)
that are a reasonable fraction of the actual database sizes you anticipate.
2) Most SAN's come with software (usually a web-based) interface. It
usually provides both viewable and settable configuration stuff for a wide
variety of parameters. Perhaps your admins can hook you up with some
read-only version of this?
3) That is a pretty poorly performing SAN.
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net
"ianwr" <ianwrigglesworth@.yahoo.co.uk> wrote in message
news:db1fb7b5-2e75-40a3-adec-33d48df2db75@.f3g2000hsg.googlegroups.com...
> Just ran the SQLIO tests and got the following for 64K reads with an
> outstanding number of 64 :-
> Sequential Read file size 256mb I/O Sec 2611 MB/Sec 163
> min lat 4ms avg lat 48 max lat 124
> Random Read file size 256mb I/O Sec 2743 MB/Sec 171
> min lat 4ms avg lat 46 max lat 83
> Sequential Read file size 2gb I/O Sec 2260 MB/Sec
> 141 min lat 4ms avg lat 56 max lat 427
> Random Read file size 2gb I/O Sec 405 MB/Sec
> 25 min lat 4 avg lat 313 max lat 776
> Sequential Read file size 8gb I/O Sec 974 MB/Sec
> 60 min lat 4ms avg lat 130 max lat 735
> Random Read file Size 8gb I/O Sec 347 MB/Sec
> 21 min lat 21 avg lat 367 max lat 807
> The san and server are used for a DW application and i would say that
> the average file size for the databases is about 64gb but dont know if
> this is relevant.
> i've asked the SAN/Windows guy to get me the other parameters about
> the stripe size, allocation unit size etc and will get back to you
> with these. Just for future reference is there an easy way to pull off
> these figures as i'm no windows guru ;-)
>
>|||Thanks Kevin for the response.
Unfortunately for me, the main guy who looks after the san now claims
he's no san expert and I suspect that they just bought in a ready made
solution from a 3rd party supplier. No-one on site seems to know
anything about the san and when asking what the raid stripe size or if
the partition had been sector aligned i just got a number of blank
looks lol ;-(
Fortunately, my contract only runs for a few more weeks here, so i'll
pass the information onto them and maybe they suggest getting a
specialist san consultant onboard to review and configure accordingly.
I know a new SAN will be ordered within the next few months so i guess
they will have to put up with these performance problems until then.
Thanks all your input and thoughts.
Ian.

No comments:

Post a Comment