Monday, February 27, 2012

Help with most efficient column sorting technique

I have a SQL 2005 database with 4 million+ rows. One table in
particular has about 35 columns. I have implemented a paged data grid
results view for them. They want to be able to sort on the majority of
the columns. When they sort they want all the results sorted not just
the visible result set, but it's not practical for me to index every
column either. There has to be a way to achieve my sorting goals. Has
anyone dealt with this issue and solved it reasonably well.
Thanks,
Matt
MJB wrote:
> I have a SQL 2005 database with 4 million+ rows. One table in
> particular has about 35 columns. I have implemented a paged data grid
> results view for them. They want to be able to sort on the majority of
> the columns. When they sort they want all the results sorted not just
> the visible result set, but it's not practical for me to index every
> column either. There has to be a way to achieve my sorting goals. Has
> anyone dealt with this issue and solved it reasonably well.
Huh?
Have u tried the ORDER BY clause?
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
|||MJB skrev:

> I have a SQL 2005 database with 4 million+ rows. One table in
> particular has about 35 columns. I have implemented a paged data grid
> results view for them. They want to be able to sort on the majority of
> the columns. When they sort they want all the results sorted not just
> the visible result set, but it's not practical for me to index every
> column either. There has to be a way to achieve my sorting goals. Has
> anyone dealt with this issue and solved it reasonably well.
> Thanks,
> Matt
I don't think there is a silver bullet for this, if the table is
updated also I guess you have to prioritize between the columns and add
indexes on the ones that must be sorted fast.
Hopefully someone else knows better.
/impslayer, aka Birger Johansson
|||Have you ever tried an ORDER BY on a non-index column that has 4 million
plus rows... I can tell you it ain't fast...
MGFoster wrote:
> MJB wrote:
> Huh?
> Have u tried the ORDER BY clause?
|||Just so I understand, you're exposing 4 million + rows of data in a
paged data grid, and your users want t obe able to sort this data?
My first question is: do they really need to see all 4 million rows of
data? How in the hell is that practical?
Stu
|||Well, it's a database that stores Ethernet traffic information. Kinda
what you might get out of Snort or something similar. In most cases the
data will be filtered a bit, but in some cases they may want to "see all
the data" (in the sense that it's paged and globally sortable). The
problem is they want to be able to sort on the ip col, the port col,
date time stamps etc (like i said there are 35+ cols in this table).
Currently only the primary key col is indexed, but it doesn't make sense
to index all of the others. Was wondering if anyone had dealt with this
before - doesn't sound like it.
Stu wrote:
> Just so I understand, you're exposing 4 million + rows of data in a
> paged data grid, and your users want t obe able to sort this data?
> My first question is: do they really need to see all 4 million rows of
> data? How in the hell is that practical?
> Stu
>
MJB wrote:
> I have a SQL 2005 database with 4 million+ rows. One table in
particular has about 35 columns. I have implemented a paged data grid
results view for them. They want to be able to sort on the majority of
the columns. When they sort they want all the results sorted not just
the visible result set, but it's not practical for me to index every
column either. There has to be a way to achieve my sorting goals. Has
anyone dealt with this issue and solved it reasonably well.
|||Actually, the company I work for manages network appliances for our
customers; the way we address the issue is bring the data off the
server onto the client pc, and let them sort it on their own pc (using
ADO.NET dataasets). We warn them if the data is going to be larger
than a few thousand rows, so they can decline, and only sample the
appropriate amount.
The largest collection of related events processed so far has been
about 200,000; it took about 3 minutes to load to the client's pc, and
then they had to deal with sorting issues. At least that way, the
entire server was not bogged down for one incident.
HTH,
Stu
|||Standard SQL-92 does not allow you to use a function or expression in
an ORDER BY clause. The ORDER BY clause is part of a cursor and it can
only see the column names that appear in the SELECT clause list that
was used to build the result set. BP will now chime in that SQL-99
(officially called "a standard in progress" and not recognized by the
U.S. Government for actual use) does allow this.
But aside from this, there is the good programming practice of showing
the fields that are used for the sort to the user, usually on the left
side of each line since we read left to right.
The standard trick for picking a sorting order at run time is to use a
flag in CASE expression. If you want to sort on more than one column
and allow all possible combinations of sorting use one CASE per column:
SELECT
CASE @.flag_1
WHEN 'a' THEN CAST (a AS CHAR(n))
WHEN 'b' THEN CAST (b AS CHAR(n))
WHEN 'c' THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1,
CASE @.flag_2
WHEN 'x' THEN CAST (x AS CHAR(n))
WHEN 'y' THEN CAST (y AS CHAR(n))
WHEN 'z' THEN CAST (z AS CHAR(n))
ELSE NULL END AS sort_2,
...
CASE @.flag_n
WHEN 'n1' THEN CAST (n1 AS CHAR(n))
WHEN 'n2' THEN CAST (n2 AS CHAR(n))
WHEN 'n3' THEN CAST (n3 AS CHAR(n))
ELSE NULL END AS sort_2,
FROM Foobar
WHERE ...
ORDER BY sort_1, sort_2, ...
More than one sort column and only a limited set of combinations then
use concatenation.
CASE @.flag_1
WHEN 'ab'
THEN CAST(a AS CHAR(n)) ||' ' || CAST(b AS CHAR(n))
WHEN 'ba'
THEN CAST(b AS CHAR(n)) ||' ' || CAST(a AS CHAR(n))
ELSE NULL END AS sort_1,
If you need ASC and DESC options, then use a combination of CASE and
ORDER BY
CASE @.flag_1
WHEN @.flag_1 = 'a' AND @.flag_1_ad = 'ASC'
THEN CAST (a AS CHAR(n))
WHEN @.flag_1 = 'b' AND @.flag_1_ad = 'ASC'
THEN CAST (b AS CHAR(n))
WHEN @.flag_1 = 'c' AND @.flag_1_ad = 'ASC'
THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1_a,
CASE @.flag_1
WHEN @.flag_1 = 'a' AND @.flag_1_ad = 'DESC'
THEN CAST (a AS CHAR(n))
WHEN @.flag_1 = 'b' AND @.flag_1_ad = 'DESC'
THEN CAST (b AS CHAR(n))
WHEN @.flag_1 = 'c' AND @.flag_1_ad = 'DESC'
THEN CAST (c AS CHAR(n))
ELSE NULL END AS sort_1_d
.. ORDER BY sort_1_a ASC, sort_1_d DESC
I have shown explicit CAST(<exp> AS CHAR(n)), but if the datatypes of
the THEN clause expressions were already the same, there would be no
reason to force the conversions.
You change the ELSE NULL clause to any constant of the appropriate
datatype, but it should be something useful to the reader.
A neater way of doing this is to use one column for each sorting option
so you do not have worry about CAST() operations.
SELECT ...
CASE WHEN @.flag = 'a' THEN a ELSE NULL END AS sort1,
CASE WHEN @.flag = 'b' THEN b ELSE NULL END AS sort2,
CASE WHEN @.flag = 'c' THEN c ELSE NULL END AS sort3
FROM Foobar
WHERE ...
ORDER BY sort1, sort2, sort3;

No comments:

Post a Comment