Monday, March 19, 2012

Identity sequence of multi-threads insertion

Recently I'm working on a multi-thread solution based on SQL-Server, now I'm facing such a problem:

Suppose I have process No.1(with multi-threads) inserting data to Table A, which has its identity column auto generated. And process No.2(also with multi-threads) retrieving data from Table A ,generate some records and insert the result into Table B. Both of these two processes are doing batch processing(batch retrieving and batch writing), and they are running parallelly.

Now since process No.2 retrieve data sequencely by the identity of Table A, it found there exists missing results. This is due to that records with bigger identities are not necessarily commited earlier than those who have smaller identities.

One direct solution is add one flag field in Table A indicating whether this record has been processed by process No.2, and each time it was processed , the field will be set. But unfortunatelly the table structure is not supposed to be modified.

So is there any other good solutions for this problem? Thanks.

A solution may be that process 1 uses row-level locking when select/update and process 2 use the readpast hint when selecting the records:

READPAST specifies that locked rows be skipped during the read. READPAST only applies

to transactions operating at the default READ COMMITTED isolation level, and will

only read past row-level locks. READPAST can only be used in SELECT statements.

Normal blocking can be worked around by having transactions read past rows being

locked by other transactions.
See the following article: http://www.sql-server-performance.com/rd_table_hints.asp

Else I cannot see any other solution except adding some external data structure to keep track of processing status.|||

Hi carlop, I'm afraid locks won't resolve this issue.

Maybe I should explain my problem more detailly.

Suppose Process No.1 have 3 threads working , and here are the identities generated after the insertion.

thread 0 : 1 , 4, 7

thread 1: 2 , 5, 8

thread 2: 3, 6, 9

And at one time point, thread 0 commited the transaction with 3 rows inserted , whilst Process No.2 is retrieving data from Table A , so only these 3 records were retrieved and processed. So Process No.2 will record "7" as the biggist identity it has processed, and next time it will start processing from identities greater than "7" . So records with identities 2, 5,3,6 are lost if thread 1 and thread 2 commited later.

|||You have to add an external struct that keeps track of the processing jobs. I see no other way.|||You could add another table that is logged to on insert to table A. Process 2 reads these records and deletes them from the new tableonce processed. When it goes back for a second time it reads the nest set to process and then deletes from. You could use Service Broker if you wanted as it has a nice queue mechanism|||

Maybe this is the only solution I think.

Thank you all.

No comments:

Post a Comment