Friday, February 24, 2012

IDENTITY column

Hi,
Do many people use the IDENTITY type for a column?
The advantage is that every time you add a row, that identity column is
automatically incremented, however, I have always wondered about the
following issues that I could try to test and find out, but my testing will
not necessarily reveil all there is to know that you might know off the top
of your heads.
1) can you manually insert an identity? for example, row 1, 2 and 3 already
exist and I manually INSERT a row with id 5, what will happen when identity
reaches 4 and wish to move to 5? will it skip 5 because it already exists,
overwrite it, or what?
2) What happens if identity reaches maximum? If INDENTITY is reset, while
rows already exist, what will happen to existing rows? See question 1.
3) identity keeps moving up.. if later rows are deleted, gaps are formed.
How can this be prevented? Is there a way to find most efficiently which
ID's are available?
So basically, imagine I have a table like this:
CREATE TABLE MyTable (
Id IDENTITY(1,1) PRIMARY KEY,
OtherData TEXT
)
GO
INSERT INTO MyTable(OtherData) VALUES ("test1")
INSERT INTO MyTable(OtherData) VALUES ("test2")
INSERT INTO MyTable(OtherData) VALUES ("test3")
INSERT INTO MyTable(OtherData) VALUES ("test4")
INSERT INTO MyTable(OtherData) VALUES ("test5")
DELETE FROM MyTable WHERE Id=2 OR Id=4
Now I wish to have a query that gives the available id's back (2 and 4 in
this case).
The problem is, how do you info about rows that do not exist?
Only way I can think of is get MAX(Id) and then loop through all values in
descending order until you find one that doesn't exist, testing each one.
And this can become very slow in case of many records, and can't be the way.
Any suggestions? How are IDENTITY columns used? Just keep letting it go up
until it runs out? I know this would take millions of records maybe and in
real life situations may never happen, but theoretically, it just doesn't
sound right.
rows will get deleted and added and at some point in the future, I will have
reached the ceiling because deleted id's are never reused.
Or is it much better to use a "FindFreeId" type of stored procedure or
whatever, wit or without an IDENTITY column?
LisaI just thought of a way that it might be done:
Find all consecutive rows where the difference between ID values is more
than 1.
Is there a better way?
"Lisa Pearlson" <no@.spam.plz> wrote in message
news:%23lmHXdnXFHA.2768@.tk2msftngp13.phx.gbl...
> Hi,
> Do many people use the IDENTITY type for a column?
> The advantage is that every time you add a row, that identity column is
> automatically incremented, however, I have always wondered about the
> following issues that I could try to test and find out, but my testing
> will not necessarily reveil all there is to know that you might know off
> the top of your heads.
> 1) can you manually insert an identity? for example, row 1, 2 and 3
> already exist and I manually INSERT a row with id 5, what will happen when
> identity reaches 4 and wish to move to 5? will it skip 5 because it
> already exists, overwrite it, or what?
> 2) What happens if identity reaches maximum? If INDENTITY is reset, while
> rows already exist, what will happen to existing rows? See question 1.
> 3) identity keeps moving up.. if later rows are deleted, gaps are formed.
> How can this be prevented? Is there a way to find most efficiently which
> ID's are available?
> So basically, imagine I have a table like this:
> CREATE TABLE MyTable (
> Id IDENTITY(1,1) PRIMARY KEY,
> OtherData TEXT
> )
> GO
> INSERT INTO MyTable(OtherData) VALUES ("test1")
> INSERT INTO MyTable(OtherData) VALUES ("test2")
> INSERT INTO MyTable(OtherData) VALUES ("test3")
> INSERT INTO MyTable(OtherData) VALUES ("test4")
> INSERT INTO MyTable(OtherData) VALUES ("test5")
> DELETE FROM MyTable WHERE Id=2 OR Id=4
> Now I wish to have a query that gives the available id's back (2 and 4 in
> this case).
> The problem is, how do you info about rows that do not exist?
> Only way I can think of is get MAX(Id) and then loop through all values in
> descending order until you find one that doesn't exist, testing each one.
> And this can become very slow in case of many records, and can't be the
> way.
> Any suggestions? How are IDENTITY columns used? Just keep letting it go up
> until it runs out? I know this would take millions of records maybe and in
> real life situations may never happen, but theoretically, it just doesn't
> sound right.
> rows will get deleted and added and at some point in the future, I will
> have reached the ceiling because deleted id's are never reused.
> Or is it much better to use a "FindFreeId" type of stored procedure or
> whatever, wit or without an IDENTITY column?
> Lisa
>|||>> Do many people use the IDENTITY type for a column? <<
Lots of them, usually newbies who have not had a course in RDBMS so
they would understand what a relatioanl key is and still think in terms
of sequential files and records. Google some of my rants about this
design flaw.
Or talk to an old programmer who grew up with magnetic tape files which
had record numbers instead of relational keys. We had all those
problems with 1950's technology. Dijkstra was right; we keep
re-invetning the same errors over and over in this trade.|||Many opinions:
http://www.dbpd.com/vault/9805xtra.htm
http://www.sqlteam.com/forums/topic...136&whichpage=1
Types" target="_blank">http://www.ssw.com.au/SSW/Standards...ey
Types
"Lisa Pearlson" <no@.spam.plz> wrote in message
news:%23lmHXdnXFHA.2768@.tk2msftngp13.phx.gbl...
> Hi,
> Do many people use the IDENTITY type for a column?
> The advantage is that every time you add a row, that identity column is
> automatically incremented, however, I have always wondered about the
> following issues that I could try to test and find out, but my testing
> will not necessarily reveil all there is to know that you might know off
> the top of your heads.
> 1) can you manually insert an identity? for example, row 1, 2 and 3
> already exist and I manually INSERT a row with id 5, what will happen when
> identity reaches 4 and wish to move to 5? will it skip 5 because it
> already exists, overwrite it, or what?
> 2) What happens if identity reaches maximum? If INDENTITY is reset, while
> rows already exist, what will happen to existing rows? See question 1.
> 3) identity keeps moving up.. if later rows are deleted, gaps are formed.
> How can this be prevented? Is there a way to find most efficiently which
> ID's are available?
> So basically, imagine I have a table like this:
> CREATE TABLE MyTable (
> Id IDENTITY(1,1) PRIMARY KEY,
> OtherData TEXT
> )
> GO
> INSERT INTO MyTable(OtherData) VALUES ("test1")
> INSERT INTO MyTable(OtherData) VALUES ("test2")
> INSERT INTO MyTable(OtherData) VALUES ("test3")
> INSERT INTO MyTable(OtherData) VALUES ("test4")
> INSERT INTO MyTable(OtherData) VALUES ("test5")
> DELETE FROM MyTable WHERE Id=2 OR Id=4
> Now I wish to have a query that gives the available id's back (2 and 4 in
> this case).
> The problem is, how do you info about rows that do not exist?
> Only way I can think of is get MAX(Id) and then loop through all values in
> descending order until you find one that doesn't exist, testing each one.
> And this can become very slow in case of many records, and can't be the
> way.
> Any suggestions? How are IDENTITY columns used? Just keep letting it go up
> until it runs out? I know this would take millions of records maybe and in
> real life situations may never happen, but theoretically, it just doesn't
> sound right.
> rows will get deleted and added and at some point in the future, I will
> have reached the ceiling because deleted id's are never reused.
> Or is it much better to use a "FindFreeId" type of stored procedure or
> whatever, wit or without an IDENTITY column?
> Lisa
>|||This is a bit of a holy issue.

> Do many people use the IDENTITY type for a column?
Yes much to the chagrin of purists.
<snip>

> 1) can you manually insert an identity? for example, row 1, 2 and 3 alread
y
> exist and I manually INSERT a row with id 5, what will happen when identit
y
> reaches 4 and wish to move to 5? will it skip 5 because it already exists,
> overwrite it, or what?
Yes. In SQL Server, you can use the Identity_Insert statement to temporarily
allow the forceful setting of an identity value. If the identity column is t
he
primary key, then even if Identity_Insert is enabled, duplicates will not be
allowed. Thus, this is one of the reasons that if you are making your identi
ty
column the primary key, you should explicitly do so.

> 2) What happens if identity reaches maximum? If INDENTITY is reset, while
rows
> already exist, what will happen to existing rows? See question 1.
It will throw an error. However, this can be easily anticipated and handled
by
adjusting the data type of the column to something larger. It is highly doub
tful
you will reach the cap of a BigInt.

> 3) identity keeps moving up.. if later rows are deleted, gaps are formed.
How
> can this be prevented? Is there a way to find most efficiently which ID's
are
> available?
Gaps will form and you shouldn't care about it. Identity columns are not mea
nt
to provide sequence with no gaps. For that, you should use a different, cust
om
and intentional mechanism. Identity columns are simply a cheap and easy way
of
providing a unique identifier for each row.

> Any suggestions? How are IDENTITY columns used? Just keep letting it go up
> until it runs out? I know this would take millions of records maybe and in
> real life situations may never happen, but theoretically, it just doesn't
> sound right.
> rows will get deleted and added and at some point in the future, I will ha
ve
> reached the ceiling because deleted id's are never reused.
Purists would agree and thus do not recommend their use. People like Mr. Cel
ko
(and officially Codd) argue that the primary key should be an assemblage of
actual data values not some system generated key. However, since it is highl
y
unlikely that you will run out of key values, this reason alone should not b
e a
deterrent. If you thought it was even remotely possible to hit 2 billion row
s
(the maximum for an int) you can use a BigInt.
Thomas|||"Thomas Coleman" <thomas@.newsgroup.nospam> wrote in message
news:Op23lUoXFHA.3864@.TK2MSFTNGP10.phx.gbl...
> This is a bit of a holy issue.
>
> Yes much to the chagrin of purists.
> <snip>
>
> Yes. In SQL Server, you can use the Identity_Insert statement to
> temporarily allow the forceful setting of an identity value. If the
> identity column is the primary key, then even if Identity_Insert is
> enabled, duplicates will not be allowed. Thus, this is one of the reasons
> that if you are making your identity column the primary key, you should
> explicitly do so.
Thomas,
Lisa might get the wrong impression from your answer to this part.
The answer to "Will it skip 5 because it already exists, overwrite it, or
what?" is this, I think:
If the value 5 is inserted "manually", and 5 is larger than the
next "automatic" identity value would have been, the identity
seed for this table will be reset as if 5 were an automatically
inserted value (6 will be inserted next, skipping any values
below 5 that were never generated). If 5 is smaller than
would otherwise have been inserted, the sequence of automatically
generated values will not be changed.
These rules apply whether or not the insert of the value 5 succeeds
or fails. It can fail because of a primary key or unique constraint
on the column if 5 is already a value in the column.
The Books Online topic SET IDENTITY_INSERT and the other
articles about the identity property explain this behavior.
SK
>
> It will throw an error. However, this can be easily anticipated and
> handled by adjusting the data type of the column to something larger. It
> is highly doubtful you will reach the cap of a BigInt.
>
> Gaps will form and you shouldn't care about it. Identity columns are not
> meant to provide sequence with no gaps. For that, you should use a
> different, custom and intentional mechanism. Identity columns are simply a
> cheap and easy way of providing a unique identifier for each row.
>
> Purists would agree and thus do not recommend their use. People like Mr.
> Celko (and officially Codd) argue that the primary key should be an
> assemblage of actual data values not some system generated key. However,
> since it is highly unlikely that you will run out of key values, this
> reason alone should not be a deterrent. If you thought it was even
> remotely possible to hit 2 billion rows (the maximum for an int) you can
> use a BigInt.
>
> Thomas
>|||Joe,
Do you know many "newbies" who have been doing this so long they "still
think in terms of sequential files and records?" ;-)
Lisa,
As you read more on this issue you'll see a debate between the "practical"
side and the "purist" side. Both views have valid points that should be
considered when making your design. When to use an Identity? That depends on
the requirements. But it's a NOT moral issue and it's neither always good,
or always bad. There is no perfect right answer. It just depends.
-Paul
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1116731472.050155.78780@.o13g2000cwo.googlegroups.com...
> Lots of them, usually newbies who have not had a course in RDBMS so
> they would understand what a relatioanl key is and still think in terms
> of sequential files and records. Google some of my rants about this
> design flaw.
> Or talk to an old programmer who grew up with magnetic tape files which
> had record numbers instead of relational keys. We had all those
> problems with 1950's technology. Dijkstra was right; we keep
> re-invetning the same errors over and over in this trade.
>|||> Lisa might get the wrong impression from your answer to this part.
> The answer to "Will it skip 5 because it already exists, overwrite it, or
> what?" is this, I think:
> If the value 5 is inserted "manually", and 5 is larger than the
> next "automatic" identity value would have been, the identity
> seed for this table will be reset as if 5 were an automatically
> inserted value (6 will be inserted next, skipping any values
> below 5 that were never generated). If 5 is smaller than
> would otherwise have been inserted, the sequence of automatically
> generated values will not be changed.
> These rules apply whether or not the insert of the value 5 succeeds
> or fails. It can fail because of a primary key or unique constraint
> on the column if 5 is already a value in the column.
I think you're right about the interpretation of her question. Although, as
I
mentioned, perfect sequence and the ability to predict new values should not
be
a consideration if using an identity column.
Thomas|||"Thomas Coleman" <thomas@.newsgroup.nospam> wrote in message
news:Op23lUoXFHA.3864@.TK2MSFTNGP10.phx.gbl...
> This is a bit of a holy issue.
Only for people who can't walk and chew gum at the same time.

> Yes much to the chagrin of purists.
> <snip>
>
already
identity
exists,
> Yes. In SQL Server, you can use the Identity_Insert statement to
temporarily
> allow the forceful setting of an identity value. If the identity column is
the
> primary key, then even if Identity_Insert is enabled, duplicates will not
be
> allowed. Thus, this is one of the reasons that if you are making your
identity
> column the primary key, you should explicitly do so.
>
while rows
> It will throw an error. However, this can be easily anticipated and
handled by
> adjusting the data type of the column to something larger. It is highly
doubtful
> you will reach the cap of a BigInt.
>
formed. How
ID's are
> Gaps will form and you shouldn't care about it. Identity columns are not
meant
> to provide sequence with no gaps. For that, you should use a different,
custom
> and intentional mechanism. Identity columns are simply a cheap and easy
way of
> providing a unique identifier for each row.
>
up
in
doesn't
have
> Purists would agree and thus do not recommend their use. People like Mr.
Celko
> (and officially Codd) argue that the primary key should be an assemblage
of
> actual data values not some system generated key. However, since it is
highly
> unlikely that you will run out of key values, this reason alone should not
be a
> deterrent. If you thought it was even remotely possible to hit 2 billion
rows
> (the maximum for an int) you can use a BigInt.
>
> Thomas
>

No comments:

Post a Comment