Sunday, February 19, 2012

Identity and primary key

I confuse about Identity and primary key, what is the different between them. One table can have no primary key. Right? Thanks.Yes a table doesn't need a primary key...but if it's for relational data you should.

Heap tables wouldn't have a PK.

And IDENTITY Column is a special property that enables a column to set an incremental value...so you can't have the same value twice.

Alot of people CONFUSE this as a Primary Key...they even go out of their way to make it one...so would you do?

CREATE TABLE State (
StateId int IDENTITY(1,1) PRIMARY KEY
, StateCd char(2)
, StateName varchar(50))

And then still have to create a unique contraint on statedCd?

Doesn't make sense does it.

Some people will argue that at some point a state will change their code, and then you could just update the state table and be done with it, and not have to update every other table that stores state code.

I find that argument amusing.|||Glad we could entertain you! :D|||definitely i'm with brett on this one, the surrogate key for state code is insane

;)|||I agree in principle, but as I've said before I have a code library of reusable functions, procedures, and subroutines that work off of GUIDs. So to me, the extra column is worth the time saved in development and maintenance.|||you use a GUID on a table of state codes and state names?

WTF OMG LOL!

no offence :)|||I can just picture the performance on a system built based on PK's made of GUID's! Are they all clustered too?|||Not to launch the debate for the millionth time...but I've never heard of anyone using a GUID for code tables....

Yo, blind dude...what does it buy you?|||I've never seen a performance hit. These are not Terabyte databases. And as far as a GUID on State codes, what does he cost me? There are only 50 records, so how much more disk space or processing time does it require?
What it buys me is a lot of flexibility importing data and a lot of scalability across the enterprise. I can have multiple copies of the database at separate locations and merge the data without having to worry about key conflicts. I can pre-assign IDs to staging data and load it without ever having to figure out what incremental value was assigned to the record, for later processing. I can set up a single table in the database, for tracking record modification for instance, and have referential integrity between it and every other table in the database based solely on the Primary Key (or unique index, which is what I normally assign to the GUID value). I've developed Rapid Application Development templates in Access that allow me to create a new form based upon a table and have it automatically included in the menu system and synchronized with all the related forms in about fifteen minutes.
I can't tell you the number of times a client has come to me with a modification requirement that I was able to implement quickly just because my schema was based upon GUID surrogate keys.
I've use natural keys, identity keys, and GUIDs, and I've just decided that I can do a lot of cool SQL with GUIDs.
As far as a performance hit, MSSQL's own replication relies upon GUIDs.
So I'm gonna keep usin' 'em. Nyah! :p|||And as far as a GUID on State codes, what does he cost me? There are only 50 records, so how much more disk space or processing time does it require?well, okay, now, either you use a GIUD on the state code table just for the hell of it (insanity) with the 2-char state code as the primary key, or else you're using the GIUD as the primary key, in which case your 2-million-row employee address database has 2 million 16-byte GUIDs where a 2-char code would normally be

so how much more disk space and processing does it require? 2 million times 26 bytes -- yeah, i know, that's not much space, but the processing might hurt a lot

and don't forget you also have to always join to the state code table to find out that

123 sesame street, hollywood, 3F2434E0-4F76-12F3-9A2C-0305E54C3301

is in florida and not california

insane, i tell you, insane|||Yes, it is a disadvantage to have to join to the State code table to look up the state for a GUID. But that is a drawback of any surrogate key. Honestly, I don't always use surrogate keys for lookup tables, but when I do I use a GUID.

Here is an excellent article on the pros and cons of GUIDs and INTs:
http://www.informit.com/articles/article.asp?p=25862&redir=1

...so I ran a test on two tables. One table used INT, the other GUID. 1,000,000 rows in each table. 4,500 searches on each table. No discernable difference. Script is attached.

For gig and up tables, maybe the size of the GUID would start to affect performance. I don't have time to test that. But I'm not convinced the performance hit would be at all significant. How many times to I have to save 1 millisecond on a search to justify even 15 more minutes writing code?

Go ahead and laugh, but I chuckle every time I see a post on this forum asking how to retrieve the IDs of a batch of records just inserted. :cool:|||OK, read the article...lots of caveats there....|||Yeah, the article is great, for "Intro to Database Design 101". Either the author does not want to talk about "real" caveats, or he doesn't know about them himself. Here's the SHOWCONTIG results for both tables from Lindman's example, so YOU (Lindman) tell me what's wrong with this picture:

...wait a minute, I can't give you a final result because Lindman's script is still running, after 45 minutes...Is it because of those NEWID() function calls or because of IO? Well, it's IO, according to Perfmon. Anyway, based on partial results (250,000 rows or so) here's the output:


DBCC SHOWCONTIG scanning 'TestIdentity' table...

Table: 'TestIdentity' (795149878); index ID: 1, database ID: 9

TABLE level scan performed.

- Pages Scanned........................: 1445

- Extents Scanned.......................: 0

- Extent Switches.......................: 0

- Avg. Pages per Extent..................: 0.0

- Scan Density [Best Count:Actual Count]......: 0.00% [0:0]

- Logical Scan Fragmentation ..............: 0.55%

- Extent Scan Fragmentation ...............: 0.00%

- Avg. Bytes Free per Page................: 0.0

- Avg. Page Density (full)................: 99.81%

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DBCC SHOWCONTIG scanning 'TestGUID' table...

Table: 'TestGUID' (779149821); index ID: 1, database ID: 9

TABLE level scan performed.

- Pages Scanned........................: 2581

- Extents Scanned.......................: 0

- Extent Switches.......................: 0

- Avg. Pages per Extent..................: 0.0

- Scan Density [Best Count:Actual Count]......: 0.00% [0:0]

- Logical Scan Fragmentation ..............: 99.11%

- Extent Scan Fragmentation ...............: 0.00%

- Avg. Bytes Free per Page................: 0.0

- Avg. Page Density (full)................: 69.56%
Pay attention to Logical Scan Fragmentation.|||Pay attention to the results, Czarjabarov. There was no discernable or consistent difference after running 4500 selects.

Outahere.|||Pay attention to the results, Czarjabarov. There was no discernable or consistent difference after running 4500 selects.

Outahere.You wouldn't be showing "online" if you were really "outahere", and please refrain yourself from chauvinistic comments,it's not gonna get you anywhere (at least not where you want to be) :rolleyes:

...And, for your information, not all apps are written to support the primitivism of the logic that you used in your example, so there, - outahere ;)|||boys will be boys...I guess...

But where does chauvinistic come from?

Damn hangover...|||rdjabarov. How did you get "0 extents scanned" in your DBCC SHOWCONTIG result?

Personally, I look at scan density myself, but that is for lack of seeing any Microsoft article that comprehensively explains any difference between the various densities.|||Brett left the country column out of his table definition. It doesn't make sense to me to try and track states without their country.

How do you deal with multiple states having the same code? Does AB refer to a state in Canada or Brazil?

How do you cope with states changing their names? Better yet, how do you cope with them changing their country? Worst of all, how do you deal with states changing their geographic boundaries?

If you deal with a small subset of states (say just the provinces of Canada), this problem seems simple. If you have to deal with the more generic idea of "states" in the abstract form, it gets really complicated.

-PatP|||so, pat, you are saying simple = integer key, complicated = guid key?

i don't buy it|||Pat goes mia, and then comes up with that...

ok..ok...I buy the thing where a State (actually a province) changes countries...when is quebec going to susceed?

I still subscribe to the theory that it is something new...what do you do with history if you decide to change a key?

Did the Entity just dissappear?|||No, not at all. The GUID versus INT issue is neither simple nor complex. If you buy into the idea of a surrogate key, how it is formatted is irrelevant, as long as the key values are unique.

What I meant to say is that if you only consider a small subset of politically stable states (such as those in the United States), the issues you face are quite simple. If you consider all states worldwide, then all of the issues get a lot more complex.

-PatP|||Pat goes mia, and then comes up with that...

ok..ok...I buy the thing where a State (actually a province) changes countries...when is quebec going to susceed?

I still subscribe to the theory that it is something new...what do you do with history if you decide to change a key?

Did the Entity just dissappear?Quebec succeeds about three times a day (sometimes from Canada, sometimes from other parts of Quebec), just visit any bar in the Province and you'll find someone in the process of declaring independance!

Trying to figure out how to track history is another issue entirely. Better than the issue of succession, how do you deal with the problems when a county (like Yugoslavia) splits, the residents can't decide on how many new countries are created (two, three, or more), or what the new states political alignments are, and the geographic boundaries are in dispute. That is a nightmare to model, and there are at least two other cases that may arise soon that make it look like child's play!

Once I get over the last ten days, I'll have some doozies to tell. Right now, I'm still just trying to get my feet back under me on a consistant basis.

-PatP|||No, not at all. The GUID versus INT issue is neither simple nor complex. If you buy into the idea of a surrogate key, how it is formatted is irrelevant, as long as the key values are unique.i am relieved to hear it

i personally think the 4-byte integer is a lot nicer of a surrogate than the 16-character guid, it's nice to get confirmation that the difference -- other than the obvious difference in table and index space -- is irrelevant

yes, that was a bit of a troll, because i do realize how difficult it is to migrate identity keys

:)|||by the way, the word you americans are having so much trouble spelling is secede

probably because it's been so long since somebody (*cough*the south*cough*) tried to do it down there

:)|||Yes, but remember that when the South tried it they were unsecessful.

My point in all this is not to claim that GUIDs are better than INTs for surrogates. They offer some additional functionality at an additional cost. It's a trade-off, like any other feature you would build into an app.

But claims that using GUIDs are going to make your application run like a slug are just gibberish.|||by the way, the word you americans are having so much trouble spelling is secedeThey are homonyms Rudy... That is a joke (taught to me years ago by a lady from Quebec, so nobody blow a gasket on it). I'll try to remember to include a smiley in the future.

-PatP|||Yes, but remember that when the South tried it they were unsecessful.

My point in all this is not to claim that GUIDs are better than INTs for surrogates. They offer some additional functionality at an additional cost. It's a trade-off, like any other feature you would build into an app.

But claims that using GUIDs are going to make your application run like a slug are just gibberish.Plus the fact that GUIDs are relatively randomly distributed fixes a number of the problems that Mr Celko and I wrangled over that are caused by the way that IDENTITY values are distributed. He finally conceeded that the nature of the surrogate key wasn't his primary problem, it was the way those keys were being used... GUIDs discourage that.

-PatP|||They are homonyms Rudythey may very well be, the way you pronounce them, but for everybody else, they aren't

:) :o :cool: ;) :rolleyes: :mad: :o :) :p :)|||Homonyms are still accepted in most of the U.S., though in many States is now illegal to use two of them in the same sentence.|||they may very well be, the way you pronounce them, but for everybody else, they aren't

:) :o :cool: ;) :rolleyes: :mad: :o :) :p :)The words never sounded exactly the same to me, but they obviously sounded the same (or at least similar enough) to her that she made the joke. She was quite "French" in many ways and very proud of it, so sometimes her humor was a bit difficult for me to follow.

-PatP|||i know that feeling, i have difficulty both understanding humour as well as conveying it

except if i'm telling a joke in person, then it's obvious

for example, this might not come across as funny at all when written down, but i guarantee you, if you heard and saw me tell it, you'd ROFL

cowboy goes into a bar and orders a drink
"say," he says to the bartender, "where is everybody? this bar is usually busier than this"
"oh," says the bartender, "they're all down at the public square, today's the day they're hanging brown paper pete"
"brown paper pete?" asks the cowboy, "who's he?"
"oh, you know," says the bartender, "his pants are made of brown paper, his vest is made of brown paper, even his hat is made of brown paper"
"oh," says the cowboy, "what're they hanging him for?"
"rustling"|||ah... you must be french rudy, I hear the french have no concept of humor eh.
insert emoticom here.|||Let's see, where is that recipe.....Ahh...add one can gasoline...light match...

So, how are things in Nunavut these days? (ducks)

The only rule of thumb I have been given, but which I admittedly do not get to use very often, is if the natural primary key value(s) is likely to change, then you should use a surrogate key. If they are going to be static, then you should use the natural key. The only exception to that is if the natural key is going to make for a poor index (by being way too wide). The only problem with these rules is that they are entirely subjective. Does half of the Northwest Territory re-naming itself count as a common enough change? None of us (in the US, anyway) saw it coming.

In case you are behind on news, Vermont debates about seceding (happy Rudy? ;-)) about once every 10 years or so, and there is a group in Northern California that wants to separate from Southern California. Then there was a story a few years back about a vote to separate a piece of LA into a new town.

On the other hand, Natural keys make life so much easier in highly normalized schemas..|||Hmmm...Seems I missed one, too...

http://zapatopi.net/cascadia.html|||http://zapatopi.net/cascadia.htmlthat's hilarious

thank you for that link

i love how well it's done

this is from the same guy who created the AFDB which i wrote about last year -- http://rudy.ca/afdb.html|||rdjabarov. How did you get "0 extents scanned" in your DBCC SHOWCONTIG result?

Personally, I look at scan density myself, but that is for lack of seeing any Microsoft article that comprehensively explains any difference between the various densities.I actually ran it on Yukon, and this value is not computed because...well, just take my word for it. For 7.0 and 2K it will show the number of extents used by a table or index.|||Ahh. Good thing the local users group is going to explain some of that:

http://www.nesql.org/default.aspx|||Where? I don't see anything except the main page|||Should be the next meeting. Dec 9.

No comments:

Post a Comment