Monday, March 26, 2012

Help With Slow SQL Query

This query takes 1 minute to execute...
SELECT SUM([ProcessCount])
FROM [ProcessTable]
WHERE [FKBatchID] = 1
The table contains about 5,000,000 records.
The total record count that matches the WHERE clause is 50,000 records.
The table has a clustered index on the primary key.
The foreign key in the WHERE clause has a non-clustered, non-unique index.
The execution plan shows a 100% cost on a clustered index scan against the p
rimary key of the table, with a WHERE clause for the
[FKBatchID] column.
I had a similar problem with another query which was far more complex. I sol
ved it by altering the joins and rearranging predicates
in the WHERE clause. This cause the plan to no longer use the clustered inde
x scan at 100% cost and the execution time on that query
went from 2 minutes to 5 seconds. However, the preceding query is so simple
I don't know what to do.
Please help.
ChrisGTwo questions/suggestions:
First of all, why are you clustering on the primary key? I don't know about
your data or needs, but in my general experience I've found that indexes on
PKs are used more often for random data retrieval (give me this one row
identified by this one PK). Indexes on FKs, on the other hand, are used for
more range-related activity (give me these rows identified by this FK).
Indeed, for this query, having a clustered index on FKBatchID would allow
the data to be read contiguously from the disc, greatly increasing your
performance. So you might consider switching that.
Second, you could try covering the non-clustered index so that it includes
ProcessCount (something like, CREATE INDEX MyIndex ON
ProcessTable(FKBatchID, ProcessCount)) ... This way, the query shouldn't
have to go into the leaves of the cluster to get the data it needs... Worth
a try, at any rate...
"Chris Gallucci" <chris@.gallucci.com> wrote in message
news:emlZ6$rAEHA.2768@.tk2msftngp13.phx.gbl...
> This query takes 1 minute to execute...
> SELECT SUM([ProcessCount])
> FROM [ProcessTable]
> WHERE [FKBatchID] = 1
> The table contains about 5,000,000 records.
> The total record count that matches the WHERE clause is 50,000 records.
> The table has a clustered index on the primary key.
> The foreign key in the WHERE clause has a non-clustered, non-unique index.
> The execution plan shows a 100% cost on a clustered index scan against the
primary key of the table, with a WHERE clause for the
> [FKBatchID] column.
> I had a similar problem with another query which was far more complex. I
solved it by altering the joins and rearranging predicates
> in the WHERE clause. This cause the plan to no longer use the clustered
index scan at 100% cost and the execution time on that query
> went from 2 minutes to 5 seconds. However, the preceding query is so
simple I don't know what to do.
> Please help.
> ChrisG
>|||WOW! You're the man! (see *** inline )
"Adam Machanic" <amachanic@.air-worldwide.nospamallowed.com> wrote in message
news:eYLXOPsAEHA.4080@.TK2MSFTNGP09.phx.gbl...
| Two questions/suggestions:
|
| First of all, why are you clustering on the primary key? I don't know abo
ut
| your data or needs, but in my general experience I've found that indexes o
n
| PKs are used more often for random data retrieval (give me this one row
| identified by this one PK). Indexes on FKs, on the other hand, are used f
or
| more range-related activity (give me these rows identified by this FK).
| Indeed, for this query, having a clustered index on FKBatchID would allow
| the data to be read contiguously from the disc, greatly increasing your
| performance. So you might consider switching that.
*** I'll look into this but I'm almost sure we have a couple of critical que
ries that require this.
| Second, you could try covering the non-clustered index so that it includes
| ProcessCount (something like, CREATE INDEX MyIndex ON
| ProcessTable(FKBatchID, ProcessCount)) ... This way, the query shouldn't
| have to go into the leaves of the cluster to get the data it needs... Wort
h
| a try, at any rate...
|
***Bang! That smokes. The query is instantaneous.
Thanks so much.
ChrisG|||Hi Adam,
I am a aspirant of RDBMS design and learning things. I
would to request you to give some practical/technical
information on the Below suggestion by you as It is
interesting.
"Indexes on PKs are used more often for random data
retrieval (Eg: give me this one row identified by this one
PK). Indexes on FKs, on the other hand, are used for more
range-related activity (Eg: give me these rows identified
by this FK)".
Thanks in Advance
Chip

>--Original Message--
>Two questions/suggestions:
>First of all, why are you clustering on the primary key?
I don't know about
>your data or needs, but in my general experience I've
found that indexes on
>PKs are used more often for random data retrieval (give
me this one row
>identified by this one PK). Indexes on FKs, on the other
hand, are used for
>more range-related activity (give me these rows
identified by this FK).
>Indeed, for this query, having a clustered index on
FKBatchID would allow
>the data to be read contiguously from the disc, greatly
increasing your
>performance. So you might consider switching that.
>Second, you could try covering the non-clustered index so
that it includes
>ProcessCount (something like, CREATE INDEX MyIndex ON
>ProcessTable(FKBatchID, ProcessCount)) ... This way, the
query shouldn't
>have to go into the leaves of the cluster to get the data
it needs... Worth
>a try, at any rate...
>
>"Chris Gallucci" <chris@.gallucci.com> wrote in message
>news:emlZ6$rAEHA.2768@.tk2msftngp13.phx.gbl...
50,000 records.
clustered, non-unique index.
index scan against the
>primary key of the table, with a WHERE clause for the
far more complex. I
>solved it by altering the joins and rearranging predicates
use the clustered
>index scan at 100% cost and the execution time on that
query
preceding query is so
>simple I don't know what to do.
>
>.
>|||Okay, let's pretend that we're modelling data for an HR management company
that does HR for lots of companies... They might have some tables like:
CREATE TABLE Companies(CompanyID INT NOT NULL, CompanyName VARCHAR(20) NOT
NULL)
GO
ALTER TABLE Companies ADD CONSTRAINT PK_Companies PRIMARY KEY (CompanyID)
GO
CREATE TABLE Employees(EmployeeID INT NOT NULL, CompanyID INT NOT NULL,
EmployeeName VARCHAR(20) NOT NULL)
GO
ALTER TABLE Employees ADD CONSTRAINT PK_Employees PRIMARY KEY (EmployeeID)
GO
ALTER TABLE Employees ADD CONSTRAINT FK_Companies FOREIGN KEY (CompanyID)
REFERENCES Companies (CompanyID)
GO
Now we might want to think about what kinds of questions they would ask:
What employees are in company XYZ? How many employees are in company XYZ?
What company is employee XYZ in?
For the first two questions, we might request the data via CompanyName. The
server would search the table for the row containing that name, at which
point the primary key would be obtained and used to filter the Employees
table via the foreign key, FK_Companies. The search on the company table,
via the CompanyName column, would not use a clustered index on the PK.
Neither would the filtration on the Employees table use any index on
EmployeeID. A clustered index on the FK, CompanyID, would be quite helpful,
as the server could then retrieve data contiguously (as I said in my
original post).
For the third question, we might request the data via an EmployeeID; but
this question will not require a clustered index because the EmployeeID is
only pointing to a single row. So we need to find that single row as
quickly as possible and use it to answer the question. No ordering,
grouping, or contiguous data access will be necessary on the Employees
table.
There may be other cases where this doesn't hold true (which is why we have
jobs; if there were cookbook answers to all questions we wouldn't be
needed), but I think this methodology tends to work for the majority of
cases.
"Chip" <anonymous@.discussions.microsoft.com> wrote in message
news:798b01c402d3$37d88bf0$a301280a@.phx.gbl...
> Hi Adam,
> I am a aspirant of RDBMS design and learning things. I
> would to request you to give some practical/technical
> information on the Below suggestion by you as It is
> interesting.
> "Indexes on PKs are used more often for random data
> retrieval (Eg: give me this one row identified by this one
> PK). Indexes on FKs, on the other hand, are used for more
> range-related activity (Eg: give me these rows identified
> by this FK)".
> Thanks in Advance
> Chip
>
> I don't know about
> found that indexes on
> me this one row
> hand, are used for
> identified by this FK).
> FKBatchID would allow
> increasing your
> that it includes
> query shouldn't
> it needs... Worth
> 50,000 records.
> clustered, non-unique index.
> index scan against the
> far more complex. I
> use the clustered
> query
> preceding query is so|||Try putting a nonclustered index on processcount and fkbatchID if this is
something you do often, if fkbatchid is your clustered index you only have
to put a nonclustered index on processcount as the clustered key is included
in all NCI's. Regarding clustered indexes on a PK (if it is an IDENT
propertied column) if you are heavy inserts this is a good idea as it
creates hot spots on the disk as your inserts will fall to bottom of the
leaf level (reducing page splits and affording you not to have to mess with
fill factor) If you are query intensive and light inserts, test out Adam's
solution.
HTH
--
Ray Higdon MCSE, MCDBA, CCNA
--
"Chris Gallucci" <chris@.gallucci.com> wrote in message
news:emlZ6$rAEHA.2768@.tk2msftngp13.phx.gbl...
> This query takes 1 minute to execute...
> SELECT SUM([ProcessCount])
> FROM [ProcessTable]
> WHERE [FKBatchID] = 1
> The table contains about 5,000,000 records.
> The total record count that matches the WHERE clause is 50,000 records.
> The table has a clustered index on the primary key.
> The foreign key in the WHERE clause has a non-clustered, non-unique index.
> The execution plan shows a 100% cost on a clustered index scan against the
primary key of the table, with a WHERE clause for the
> [FKBatchID] column.
> I had a similar problem with another query which was far more complex. I
solved it by altering the joins and rearranging predicates
> in the WHERE clause. This cause the plan to no longer use the clustered
index scan at 100% cost and the execution time on that query
> went from 2 minutes to 5 seconds. However, the preceding query is so
simple I don't know what to do.
> Please help.
> ChrisG
>|||Thanks for the clarification, Ray; most of my experience is with large
datawarehouse type applications so light on the insert (during production
hours) and very heavy querying is the direction I'm coming from.
"Ray Higdon" <sqlhigdon@.nospam.yahoo.com> wrote in message
news:es0OpkwAEHA.2348@.TK2MSFTNGP09.phx.gbl...
> Try putting a nonclustered index on processcount and fkbatchID if this is
> something you do often, if fkbatchid is your clustered index you only have
> to put a nonclustered index on processcount as the clustered key is
included
> in all NCI's. Regarding clustered indexes on a PK (if it is an IDENT
> propertied column) if you are heavy inserts this is a good idea as it
> creates hot spots on the disk as your inserts will fall to bottom of the
> leaf level (reducing page splits and affording you not to have to mess
with
> fill factor) If you are query intensive and light inserts, test out Adam's
> solution.
> HTH
> --
> Ray Higdon MCSE, MCDBA, CCNA
> --
> "Chris Gallucci" <chris@.gallucci.com> wrote in message
> news:emlZ6$rAEHA.2768@.tk2msftngp13.phx.gbl...
index.
the
> primary key of the table, with a WHERE clause for the
> solved it by altering the joins and rearranging predicates
> index scan at 100% cost and the execution time on that query
> simple I don't know what to do.
>

No comments:

Post a Comment