Identity column: Identity keys in a One-to-Zero-or-Many relationship

Considering the following:
CREATE TABLE CustomerTypes
(
PKCustomerType INT IDENTITY (1,1) NOT NULL,
CustomerTypeDesc VARCHAR(30)
)
GO
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail')
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale')
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate')
CREATE TABLE Customers
(
PKCustomer INT IDENTITY (1,1) NOT NULL,
CustomerID VARCHAR(20),
FKCustomerType INT
)
GO
INSERT Customers (CustomerID, FKCustomerType) Values ('CUST001', 1)
INSERT Customers Values ('CUST002', 3)
INSERT Customers Values ('CUST003', 2)
How should I handle a case where I don't want to specify a customer
type for whatever reason? Perhaps it is unknown or not applicable in
some cases. I could add a 'None or N/A' record in the CustomerTypes
table for that purpose, but when validating the data in an INSERT
stored procedure, for example, I would like to have a value to use as
default in the case of a missing of invalid field value being passed
for FKCustomerType. If it weren't an identity field in the
CustomerTypes table, I could just create a CustomerType record with a
PKCustomerType value of 0 and always use that value in such cases. With
it being an identity field, however, I can't count on a specific value
for that purpose. It seems I have two options:
1) Create the CustomerTypes table without PKCustomerType being an
identity field...insert the 'None or N/A' record, assigning it a
PKCustomerType value of, say, 0...alter the table to make
PKCustomerType and identity field (1,1)
2) Merely use NULL in Customers where CustomerType isn't specified.
Is there a better way? If not, any insight on why one approach would be
better than the other? I'm leaning toward option 2 at this point.> 2) Merely use NULL in Customers where CustomerType isn't specified.
If CustomerType is not specified because it is unknown, then this may be a
valid approach.
However, you may want to give users the option of saying Unknown, and have
it be a value in the CustomerTypes table. Then if you later want to
differentiate between unknown and unspecified, you can add an option.
There is no reason to hard-code the default value in the parameter to the
stored procedure. Have the application generate its dropdown from the
CustomerTypes table, and you are done.
A|||>> Considering the following: <<
Let's fix what you posted so that it follows the most basic RDBMS
design principles. For example, why do you have no keys? Why did you
think that IDENTITY is every used? Why did you use PK- prefixes in
violation of ISO-11179 rules?
CREATE TABLE CustomerTypes
(customer_type INTEGER NOT NULL PRIMARY KEY,
customer_type_desc VARCHAR(30) NOT NULL);
I see from the use of IDENTITY that you did not bother with designing
an encoding scheme. You might want to learn how to do that
CREATE TABLE Customers
(customer_id VARCHAR(20) NOT NULL PRIMARY KEY,
(customer_type INTEGER NOT NULL
REFERENCES CustomerTypes (customer_type)
ON UPDATE CASCADE,
. );
VARCHAR(20) is a bit long, but by the definition of an identifier, this
has to be your key. Frankly, I would look for a DUNS number or some
indusrty standard code.
Stop putting prefixes that tell you **how** a data element is used in a
table. The name of a data element is supposed to tell you **what** it
is.
Then you need codes for those situations if they are logically
different. For example, the ICD codes for disease have '000.000' which
means "undiagnosed" and '999.999' which means "we did all the test and
still don't know!" -- very different kinds of missing data! VERY
IMPORTANT to distinguish them!! Matter of life and death, in fact.
] value being passed for FKCustomerType [sic]. <<
Now we are getting to your REAL problem. Let's get back to the basics
of an RDBMS. Rows are not records; fields are not columns; tables are
not files; there is no sequential access or ordering in an RDBMS, so
faking a record number or the lines on a piece of paper with a
proprietary IDENTITY property is dead wrong. You do not have the right
mindset, and all you are going to get on a Newsgroup is a few kludges
to help you fake it for a long period of time before the collapse.
a specific value for that purpose. <<
Surprise! Surprise! Surprise! See how non-relational, non-verifiable,
non-portable proprietary extensions screw up things?
Yes; do it right or kludge it :)
<<
Almost! Stop using IDENTITY. This is an RDBMS and not a sequential
file system.
Maybe; do you need to know anything about the missing values? Or just
that it is missing? Get a copy of SQL PROGRAMMING STYLE, SQL FOR
SMARTIES or DATA & DATABASES and then read the chapters on scales &
measurements, and how to design encoding schemes. The research pattern
is simple:
1) Look for industry standards (Google it!)
2) Look for company standards (see the accounting department for help)
3) In the remaining 5% of the cases where you have to invent something,
pick a type of encoding and follow the rules for good design. My guess
in this example is a hierachy or vector code because customers break
down into tax/tax exempt, domestic/foreign and so forth within
retailers and wholesalers.|||> design principles. For example, why do you have no keys? Why did you
> think that IDENTITY is every used? Why did you use PK- prefixes in
Why do you think SURROGATE KEYS are never used? It would appear you are one
of the only people in the industry who do not use them.
Perhaps its an implementation experience thing compared to theory.

> Almost! Stop using IDENTITY. This is an RDBMS and not a sequential
> file system.
Your bias and misunderstanding of how IDENTITY works is embarrasing - READ
THE MANUAL!

> that it is missing? Get a copy of SQL PROGRAMMING STYLE, SQL FOR
> SMARTIES or DATA & DATABASES and then read the chapters on scales &
Only if you want theory, for practical advice get a SQL Server specific
book.
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1151601604.581696.66800@.75g2000cwc.googlegroups.com...
> Let's fix what you posted so that it follows the most basic RDBMS
> design principles. For example, why do you have no keys? Why did you
> think that IDENTITY is every used? Why did you use PK- prefixes in
> violation of ISO-11179 rules?
> CREATE TABLE CustomerTypes
> (customer_type INTEGER NOT NULL PRIMARY KEY,
> customer_type_desc VARCHAR(30) NOT NULL);
> I see from the use of IDENTITY that you did not bother with designing
> an encoding scheme. You might want to learn how to do that
> CREATE TABLE Customers
> (customer_id VARCHAR(20) NOT NULL PRIMARY KEY,
> (customer_type INTEGER NOT NULL
> REFERENCES CustomerTypes (customer_type)
> ON UPDATE CASCADE,
> .. );
> VARCHAR(20) is a bit long, but by the definition of an identifier, this
> has to be your key. Frankly, I would look for a DUNS number or some
> indusrty standard code.
> Stop putting prefixes that tell you **how** a data element is used in a
> table. The name of a data element is supposed to tell you **what** it
> is.
>
> Then you need codes for those situations if they are logically
> different. For example, the ICD codes for disease have '000.000' which
> means "undiagnosed" and '999.999' which means "we did all the test and
> still don't know!" -- very different kinds of missing data! VERY
> IMPORTANT to distinguish them!! Matter of life and death, in fact.
>
> Now we are getting to your REAL problem. Let's get back to the basics
> of an RDBMS. Rows are not records; fields are not columns; tables are
> not files; there is no sequential access or ordering in an RDBMS, so
> faking a record number or the lines on a piece of paper with a
> proprietary IDENTITY property is dead wrong. You do not have the right
> mindset, and all you are going to get on a Newsgroup is a few kludges
> to help you fake it for a long period of time before the collapse.
>
> Surprise! Surprise! Surprise! See how non-relational, non-verifiable,
> non-portable proprietary extensions screw up things?
>
> Yes; do it right or kludge it :)
>
> Almost! Stop using IDENTITY. This is an RDBMS and not a sequential
> file system.
>
> Maybe; do you need to know anything about the missing values? Or just
> that it is missing? Get a copy of SQL PROGRAMMING STYLE, SQL FOR
> SMARTIES or DATA & DATABASES and then read the chapters on scales &
> measurements, and how to design encoding schemes. The research pattern
> is simple:
> 1) Look for industry standards (Google it!)
> 2) Look for company standards (see the accounting department for help)
> 3) In the remaining 5% of the cases where you have to invent something,
> pick a type of encoding and follow the rules for good design. My guess
> in this example is a hierachy or vector code because customers break
> down into tax/tax exempt, domestic/foreign and so forth within
> retailers and wholesalers.
>|||> How should I handle a case where I don't want to specify a customer
> type for whatever reason? Perhaps it is unknown or not applicable in
It should be NULL because its not specified, if you want to give it a value
you really ought to have a CustomerType of 'Not Specified'.

> CREATE TABLE CustomerTypes
> (
> PKCustomerType INT IDENTITY (1,1) NOT NULL,
> CustomerTypeDesc VARCHAR(30)
> )
> GO
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail')
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale')
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate')
> CREATE TABLE Customers
> (
> PKCustomer INT IDENTITY (1,1) NOT NULL,
> CustomerID VARCHAR(20),
> FKCustomerType INT
> )
> GO
> INSERT Customers (CustomerID, FKCustomerType) Values ('CUST001', 1)
> INSERT Customers Values ('CUST002', 3)
> INSERT Customers Values ('CUST003', 2)
You probably need to have a slight rethink on the constraints....
CREATE TABLE CustomerTypes
(
id INT IDENTITY (1,1) NOT NULL constraint sk_customertypes unique
clustered,
CustomerTypeDesc VARCHAR(30) constraint pk_customertype primary key
nonclustered
)
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail')
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale')
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate')
CREATE TABLE Customers
(
id INT IDENTITY (1,1) NOT NULL constraint sk_customer unique
clustered,
CustomerID VARCHAR(20) not null constraint pk_customer primary key
nonclustered,
CustomerType_id INT NULL references CustomerTypes( id )
)
INSERT Customers (CustomerID, CustomerType_id) Values ('CUST001', 1)
INSERT Customers (CustomerID, CustomerType_id) Values ('CUST003', NULL)
Tony.
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials
"Richard Carpenter" <rumbledor@.hotmail.com> wrote in message
news:1151594340.887702.165300@.b68g2000cwa.googlegroups.com...
> Considering the following:
> CREATE TABLE CustomerTypes
> (
> PKCustomerType INT IDENTITY (1,1) NOT NULL,
> CustomerTypeDesc VARCHAR(30)
> )
> GO
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail')
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale')
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate')
> CREATE TABLE Customers
> (
> PKCustomer INT IDENTITY (1,1) NOT NULL,
> CustomerID VARCHAR(20),
> FKCustomerType INT
> )
> GO
> INSERT Customers (CustomerID, FKCustomerType) Values ('CUST001', 1)
> INSERT Customers Values ('CUST002', 3)
> INSERT Customers Values ('CUST003', 2)
> How should I handle a case where I don't want to specify a customer
> type for whatever reason? Perhaps it is unknown or not applicable in
> some cases. I could add a 'None or N/A' record in the CustomerTypes
> table for that purpose, but when validating the data in an INSERT
> stored procedure, for example, I would like to have a value to use as
> default in the case of a missing of invalid field value being passed
> for FKCustomerType. If it weren't an identity field in the
> CustomerTypes table, I could just create a CustomerType record with a
> PKCustomerType value of 0 and always use that value in such cases. With
> it being an identity field, however, I can't count on a specific value
> for that purpose. It seems I have two options:
> 1) Create the CustomerTypes table without PKCustomerType being an
> identity field...insert the 'None or N/A' record, assigning it a
> PKCustomerType value of, say, 0...alter the table to make
> PKCustomerType and identity field (1,1)
> 2) Merely use NULL in Customers where CustomerType isn't specified.
> Is there a better way? If not, any insight on why one approach would be
> better than the other? I'm leaning toward option 2 at this point.
>|||Why not use a referential integrity constraint?
Unknown and Inapplicable are distinct reasons attributed for missing data.
And therefore, if your business demands that distinction, it makes sense to
use separate values to represent them in a table.
Anith|||Just to add, inapplicable attributes are best addressed using an entity
super-type/sub-type relationship. However, commonly people use to kludge
them with NULLs as an easy workaround.
Anith|||"Richard Carpenter" <rumbledor@.hotmail.com> wrote in message
news:1151594340.887702.165300@.b68g2000cwa.googlegroups.com...
> Considering the following:
> CREATE TABLE CustomerTypes
> (
> PKCustomerType INT IDENTITY (1,1) NOT NULL,
> CustomerTypeDesc VARCHAR(30)
> )
> GO
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail')
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale')
> INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate')
> CREATE TABLE Customers
> (
> PKCustomer INT IDENTITY (1,1) NOT NULL,
> CustomerID VARCHAR(20),
> FKCustomerType INT
> )
> GO
> INSERT Customers (CustomerID, FKCustomerType) Values ('CUST001', 1)
> INSERT Customers Values ('CUST002', 3)
> INSERT Customers Values ('CUST003', 2)
> How should I handle a case where I don't want to specify a customer
> type for whatever reason? Perhaps it is unknown or not applicable in
> some cases. I could add a 'None or N/A' record in the CustomerTypes
> table for that purpose, but when validating the data in an INSERT
> stored procedure, for example, I would like to have a value to use as
> default in the case of a missing of invalid field value being passed
> for FKCustomerType. If it weren't an identity field in the
> CustomerTypes table, I could just create a CustomerType record with a
> PKCustomerType value of 0 and always use that value in such cases. With
> it being an identity field, however, I can't count on a specific value
> for that purpose. It seems I have two options:
> 1) Create the CustomerTypes table without PKCustomerType being an
> identity field...insert the 'None or N/A' record, assigning it a
> PKCustomerType value of, say, 0...alter the table to make
> PKCustomerType and identity field (1,1)
> 2) Merely use NULL in Customers where CustomerType isn't specified.
> Is there a better way? If not, any insight on why one approach would be
> better than the other? I'm leaning toward option 2 at this point.
>
If you add some keys to the CustomerTypes table you will be able to identify
the "Unknown" value by its logical key instead of the surrogate.
The "Inapplicable" case would probably be better handled by decomposing the
Customers table.
CREATE TABLE CustomerTypes
(
PKCustomerType INT IDENTITY (1,1) NOT NULL
CONSTRAINT PK_CustomerTypes PRIMARY KEY,
CustomerTypeDesc VARCHAR(30) NOT NULL
CONSTRAINT AK1_CustomerTypes UNIQUE
);
GO
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Retail');
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Wholesale');
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Corporate');
INSERT CustomerTypes (CustomerTypeDesc) VALUES ('Unknown');
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Good gravy! I'll try to respond to your individual points without
taking too much offense to your abrasive tone...
--CELKO-- wrote:
> Let's fix what you posted so that it follows the most basic RDBMS
> design principles.
Really, don't trouble yourself. It was a simple example I threw
together to try and illustrate my question. Nothing more. I wasn't
shooting for theoretical precision. I didn't (and still don't, as I
will explain in a moment) think it that necessary in this case.

> For example, why do you have no keys? Why did you
> think that IDENTITY is every used? Why did you use PK- prefixes in
> violation of ISO-11179 rules?
> CREATE TABLE CustomerTypes
> (customer_type INTEGER NOT NULL PRIMARY KEY,
> customer_type_desc VARCHAR(30) NOT NULL);
> I see from the use of IDENTITY that you did not bother with designing
> an encoding scheme. You might want to learn how to do that
> CREATE TABLE Customers
> (customer_id VARCHAR(20) NOT NULL PRIMARY KEY,
> (customer_type INTEGER NOT NULL
> REFERENCES CustomerTypes (customer_type)
> ON UPDATE CASCADE,
> .. );
> VARCHAR(20) is a bit long, but by the definition of an identifier, this
> has to be your key. Frankly, I would look for a DUNS number or some
> indusrty standard code.
It is what it is. If my particular (hypothetical, I might add) business
process requires the use of specific proprietary customer types, then
why complicate it?

> Stop putting prefixes that tell you **how** a data element is used in a
> table. The name of a data element is supposed to tell you **what** it
> is.
I really don't see the significance of the distinction. The fact that
the name of the field starts with PK *does* tell me what it is. I
really think this one is a matter of personal preference.

> Then you need codes for those situations if they are logically
> different. For example, the ICD codes for disease have '000.000' which
> means "undiagnosed" and '999.999' which means "we did all the test and
> still don't know!" -- very different kinds of missing data! VERY
> IMPORTANT to distinguish them!! Matter of life and death, in fact.
A bit dramatic, but I hear what you're saying. However, my question
really doesn't have anything to do with what values are available to
the user, but how to handle it when the user doesn't feel inclined to
specify a fact that is not strictly required. On the application's data
entry screen, the user may only need to worry about some of the
available fields. There is no reason to force them to touch every one
when not all are relevant in every case. My question is, how to best
handle that foreign key [column] value in that case.

ic] value being passed for FKCustomerType [sic]. <<
> Now we are getting to your REAL problem. Let's get back to the basics
> of an RDBMS. Rows are not records; fields are not columns; tables are
> not files; there is no sequential access or ordering in an RDBMS, so
> faking a record number or the lines on a piece of paper with a
> proprietary IDENTITY property is dead wrong. You do not have the right
> mindset, and all you are going to get on a Newsgroup is a few kludges
> to help you fake it for a long period of time before the collapse.
Yeah, I'm guilty of referring to them in the very manner that makes a
certain type of person grit their teeth. I think that since we both
know exactly what I mean, then it really isn't an issue worth allowing
to cloud the matter. Wouldn't you agree?

on a specific value for that purpose. <<
> Surprise! Surprise! Surprise! See how non-relational, non-verifiable,
> non-portable proprietary extensions screw up things?
>
Hence my original question. Again, I really don't understand why it has
to be this difficult. How would *you* go about it, if forcing the user
to touch every field on the data entry form is *not* acceptable?

> Yes; do it right or kludge it :)
>
1) <<
> Almost! Stop using IDENTITY. This is an RDBMS and not a sequential
> file system.
There is nothing sequential about the requirements I've established
here. I don't care if the key is sequentially numbered, random
alpha-numeric or caveman hieroglyphics. All I require is that it be
guaranteed to be unique and generated automatically.

> Maybe; do you need to know anything about the missing values? Or just
> that it is missing? Get a copy of SQL PROGRAMMING STYLE, SQL FOR
> SMARTIES or DATA & DATABASES and then read the chapters on scales &
> measurements, and how to design encoding schemes. The research pattern
> is simple:
> 1) Look for industry standards (Google it!)
> 2) Look for company standards (see the accounting department for help)
> 3) In the remaining 5% of the cases where you have to invent something,
> pick a type of encoding and follow the rules for good design. My guess
> in this example is a hierachy or vector code because customers break
> down into tax/tax exempt, domestic/foreign and so forth within
> retailers and wholesalers.
Honestly, that just seemed like a whole lot of unnecessary nit-picking
and tangent traversal than was really required here. I can understand
if you feel in your infinite wisdom that it is better to suggest to me
where else I might direct my search to find the answer than give it to
me directly, but to put so much effort into avoiding answering my
question while at the same time trying to educate me (or, arguably more
accurate, indoctrinate me according to your own personal opinions) in
related regards does make me wonder who's best interest you have at
heart here.|||Anith Sen wrote:
> Why not use a referential integrity constraint?
> Unknown and Inapplicable are distinct reasons attributed for missing data.
> And therefore, if your business demands that distinction, it makes sense t
o
> use separate values to represent them in a table.
>
That would be my typical thinking as well, but RI doesn't work if the
value is not required unless a default constraint is specified. With an
identity key, the default value for the foreign key column cannot be
specified in the table definition.
On second thought, a better illustration of my question would be a name
suffix (Jr., Sr. III, etc.). A person won't necessarily *have* a suffix
in their name, yet forcing the user to select "None" from a list is
perhaps a bit cumbersome from a UI standpoint. If that suffix column in
the Customers table is a foreign key referencing the primary key column
in a reference table, and that primary key column is an identity type
key, then it would seem that the only workable foreign key value would
be NULL.

Monday, March 12, 2012

Identity keys in a One-to-Zero-or-Many relationship

No comments:

Post a Comment

Identity column

Blog Archive

About Me