I'm processing a very large record set in a standalone Java I have written. For perfomance reasons, I commit changes only once every 1000 records to the Oracle Server.
While delaying commits is a common way of increasing performance in a large load, it is nearly always at the cost of the integrity of the application. What are you sacrificing? You are sacrificing defining what a transaction is to that application. Even if all you are doing is loading heaps of identical objects to a database, it could be said that each of those objects defines a transaction in this application and by not committing after each one, you are taking a risk that later data inserts will affect this one even though that later record is completely unrelated to this one.
As an example, say you successfully insert a record into the database. This record comprises a logical unit of data, for instance say it contains details of a transaction selling a particular green car. Because we aren't committing, this data is not permanently saved to the database yet. Now lets say several hundred successful inserts later a record about a completely separate car sale (a red car) contains a bad value in a field, and it fails to insert into the database. As a result, a rollback is triggered, and even though our original green car sale data was 100% correct, through no fault of its own it gets pulled out of the database. Bummer.
For this reason, I decided to try out Java Savepoints during this load. To cut a long story short, a savepoint is the commit you have when you don't have a commit. It defines a point where you can say a logical transaction begins. In the above situation, I would issue a savepoint after successfully inserting each transaction. A savepoint would go in after the green car sale, and each of the other several hundred successful inserts after than. When the red car sale record goes in and it fails, we can simply rollback to the most recent savepoint. This means that the green car record is not affected, and when we eventually commit, we have a better set of data than without using the savepoints.
So, given that we are trying for performance here, how well does the savepoint perform relative to a commit? Much better. Without having measured it precisely, having savepoints in the tight insert loop certainly slowed down the code, but nowhere near as dramatically as having commits in there. It is definitely worth considering if you worry about data integrity like I do, and you don't want the huge overhead of commits to maintain that integrity.
Unfortunately, my customer needed superb performance during the raw load, so no commits or savepoints for me.
Search java.sql.Savepoint and java.sql.rollback on Google for more info.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment