Monday, February 27, 2012

Help with nearest neighbour problem

Hi,

I need help with creating an sql statement that determines the nearest neighbour in a lookup table to my dataset by its date column. My attempts minimizing the date difference in a cross join are lacking performance.

I have two tables:

Table 1 (Data; 13000 datasets):

ID, date
==========
1, 12.12.2006
2, 28.12.2006
3, 05.01.2007

and Table 2 (Lookup; 4000 datasets):

date, margin
==========
05.12.2006, 2.80
27.12.2006, 2.86
01.01.2007, 3.01
10.01.2007, 2.99

Expected result:

ID, date, margin
==========
1, 12.12.2006, 2.80
2, 28.12.2006, 2.86
3, 05.01.2007, 3.01

Any help is much desired

Kaicould you explain please how to calculate "nearest"|||could you explain please how to calculate "nearest"

For a given date in the data table I'm looking for the closest date in the lookup table, e.g. MIN(Datediff(dd,date(data),date(lookup))).|||that datediff might produce negative numbers, and MIN will take the largest negative number

do you perhaps mean MIN(ABS(...)) ?|||that datediff might produce negative numbers, and MIN will take the largest negative number

do you perhaps mean MIN(ABS(...)) ?

You're right, it's MIN(ABS(...)). But how do I integrate this into a view to do effective lookups?|||effective? i would imagine this to depend on the existence of appropriate indexes

the following works (i tested it on your data) but i dunno how slow it's gonna be for your large tables...with X
( ID
, TDate
, a
, LDate
, margin
)
as (
select T.ID
, T.Date as TDate
, abs(datediff(dd,T.Date,L.Date)) as a
, L.Date as LDate
, L.margin
from table1 as T
cross
join lookup as L
)
select ID
, TDate
, a
, LDate
, margin
from X as D1
where a =
( select min(a)
from X
where ID = D1.ID )|||That's quite some nifty code. Unfortunately I had to rewrite the code to not use the "WITH" statement as MS SQL Server 2000 apparently doesn't support this. Anyway I've came down to 1 min. processing time from 57 min. without touching the indexes, so thanks a lot.

This is my final code:

Select ID
, TDate
, a
, LDate
, margin
from
(
select T.ID
, T.Date as TDate
, abs(datediff(dd,T.Date,L.Date)) as a
, L.Date as LDate
, L.margin
from table1 as T
cross
join lookup as L
) X
where a =
(
select min(abs(datediff(dd,T.Date,L.Date))) as amin
from table1 as T
cross
join lookup as L
where T.ID = X.ID
)

No comments:

Post a Comment