Sunday, February 19, 2012

identity and clustered indexes

If we have a monotonically increase value( like an identity column), then
which is better: clustered or non-clustered index. The table will be a heavy
insert table.
thanks.
There is no silver bullet answer. If you use non-clustered, do you have a
good candidate for an alternate clustered index? If so, does it increase in
the same order as the identity would (e.g. a datetime column)? How will the
data be queried? What is your definition of "heavy"?
http://www.aspfaq.com/
(Reverse address to reply.)
"SQLServer DBA" <sqlsdba@.bigfoot.com> wrote in message
news:2o1kp7F5hu8mU1@.uni-berlin.de...
> If we have a monotonically increase value( like an identity column), then
> which is better: clustered or non-clustered index. The table will be a
heavy
> insert table.
> thanks.
>
|||The table does have few other columns as SARG arguments which will be non-clustered
index. There is no surrogate primary key. Hence only identity column is used as a
primary key. The table is populated by values in the fields of a form entered
by users on the web, mainly for statistical purpose. We expect that it will be
getting around 50,000 inserts a day, in due course of time.
As I understand, having the identity column declared as clustered index will mean
that at insert time , the database engine has to insert the row at appropriate
page, which can include page splitting too. If it is a non-clustered index it
can just add pages at the end and then update the key value. Shouldn't that be
faster.
thanks again.
"Aaron [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
news:etdHxyIgEHA.644@.tk2msftngp13.phx.gbl...
> There is no silver bullet answer. If you use non-clustered, do you have a
> good candidate for an alternate clustered index? If so, does it increase in
> the same order as the identity would (e.g. a datetime column)? How will the
> data be queried? What is your definition of "heavy"?
> --
> http://www.aspfaq.com/
> (Reverse address to reply.)
>
>
> "SQLServer DBA" <sqlsdba@.bigfoot.com> wrote in message
> news:2o1kp7F5hu8mU1@.uni-berlin.de...
> heavy
>
|||"SQLServer DBA" <sqlsdba@.bigfoot.com> wrote in message
news:2o1mebF5v0ppU1@.uni-berlin.de...
> The table does have few other columns as SARG arguments which will be
non-clustered
> index. There is no surrogate primary key. Hence only identity column is
used as a
> primary key. The table is populated by values in the fields of a form
entered
> by users on the web, mainly for statistical purpose. We expect that it
will be
> getting around 50,000 inserts a day, in due course of time.
> As I understand, having the identity column declared as clustered index
will mean
> that at insert time , the database engine has to insert the row at
appropriate
> page, which can include page splitting too. If it is a non-clustered index
it
> can just add pages at the end and then update the key value. Shouldn't
that be
> faster.
>
But with an identity column the inserted values are always at the end, so
the appropriate page is always the last page, and page splits shouldn't
happen. Moreover in the non-clustered index, the database has to write the
row and write the index entry, where in the clustered case the row and the
index are written onto the same page.
But the bottom line is that it can depend, and you should test.
Try this: The results come out strongly in favor of the clustered primary
key index:
David
create table test_clustered
(
id int identity primary key clustered,
name varchar(50) not null,
address1 varchar(50) not null,
address2 varchar(50) not null,
address3 varchar(50) not null
)
create index ix_test_clustered
on test_clustered(name)
create table test_nonclustered
(
id int identity primary key nonclustered,
name varchar(50) not null,
address1 varchar(50) not null,
address2 varchar(50) not null,
address3 varchar(50) not null
)
create index ix_test_nonclustered
on test_nonclustered(name)
go
set nocount on
begin transaction
declare @.i int
declare @.begin datetime
declare @.msg varchar(255)
set @.i = 0
set @.begin = getDate()
set @.msg = 'begin nonclustered '
print @.msg
while @.i < 50000
begin
insert into test_nonclustered
(name, address1, address2, address3)
values
(char(97+@.i%26) + 'name', 'address1', 'address2', 'address3')
set @.i = @.i + 1
end
set @.msg = 'end nonclustered elapsed ' + str(datediff(ms,@.begin,getDate()))
print @.msg
set @.begin = getdate()
set @.msg = 'begin clustered '
print @.msg
set @.i = 0
while @.i < 50000
begin
insert into test_clustered
(name, address1, address2, address3)
values
(char(97+@.i%26) + 'name', 'address1', 'address2', 'address3')
set @.i = @.i + 1
end
set @.msg = 'end clustered elapsed ' + str(datediff(ms,@.begin,getDate()))
print @.msg
commit transaction
go
truncate table test_clustered
truncate table test_nonclustered
|||Non-Clustered indexes are implemented behind the scenes as tables with
Clustered indexes on them. So you always run the risk of page splits
whether it is a Ci or NCI. The big difference is that NCI's do not tag
along the rest of each row, just the column(s) in the index expression.
50,000 inserts a day is really not that much and I wouldn't be as concerned
with page splits since you can control that somewhat with a proper fill
factor. If you don't do range type queries on the table a clustered index
on the Identity column is usually a good choice.
Andrew J. Kelly SQL MVP
"SQLServer DBA" <sqlsdba@.bigfoot.com> wrote in message
news:2o1mebF5v0ppU1@.uni-berlin.de...
> The table does have few other columns as SARG arguments which will be
non-clustered
> index. There is no surrogate primary key. Hence only identity column is
used as a
> primary key. The table is populated by values in the fields of a form
entered
> by users on the web, mainly for statistical purpose. We expect that it
will be
> getting around 50,000 inserts a day, in due course of time.
> As I understand, having the identity column declared as clustered index
will mean
> that at insert time , the database engine has to insert the row at
appropriate
> page, which can include page splitting too. If it is a non-clustered index
it
> can just add pages at the end and then update the key value. Shouldn't
that be[vbcol=seagreen]
> faster.
> thanks again.
>
> "Aaron [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
> news:etdHxyIgEHA.644@.tk2msftngp13.phx.gbl...
a[vbcol=seagreen]
increase in[vbcol=seagreen]
the[vbcol=seagreen]
then
>

No comments:

Post a Comment