niedziela, stycznia 07, 2018

Understanding Tibco BW Checkpoint

BW Checkpoint activity writes BW JobData to disk or database. In case BW process is killed engine can read file or database entry and resume from saved JobData. When BW process finishes checkpoint data is deleted. It is important to understand function of this activity. It helps to preserve input volatile data and prevent duplicates. Please notice that pattern JMS Queue Receive + Send used with acknowledge mode = Client produces duplicates. Acknowledge mode local transactional works only in simple scenarios and cannot be applied to every possible combination of BW activities. Transactional JMS processing in BW is possible with XA, however XA is buggy. So the simplest pattern to prevent duplicates in 99% is checkpoint.
When you process important data in BW and target system is not idempotent, you cannot pass duplicates.

Let's get back to Checkpoint activity. It can be implemented with filesystem or database. In case of high volume of concurrent large messages database is under heavy load with clob data: "INSERT INTO $table(job_id, engine_name, job_data) VALUES (?,?,?)". However to prevent duplicates in active-active deployment database is the only option to not lock data in failed BW instance. What about file implementation? It uses FileOutputStream with close() and without OS fsync() via FileChannel.force(). Java close() method does not call underlying OS close() immediately but only via Java finalizer. Only force() gives strong consistency guaranty: If this channel's file resides on a local storage device then when this method returns it is guaranteed that all changes made to the file since this channel was created, or since this method was last invoked, will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash. There is engine property bw.engine.checkpoint.file.besteffortsync causing 2 phase checkpointing (save to file and rename). Please check Tibco Support knowledge base:

BW 5.3.3 hf14, BW 5.6.1

1-8H96WT
  An empty checkpoint data file was created if the system crashed or 
  shutdown abruptly while checkpoint data was being written to the
  file. It may result in messages being lost. This is fixed by 
  introducing a new engine property, 
  'bw.engine.checkpoint.file.besteffortsync'. By default, the property
  is set to false. Setting this property to true addresses the problem,
  but introduces a performance delay.

In Linux file memory buffers are flushed to disk every 5 seconds. Please consult https://www.kernel.org/doc/Documentation/sysctl/vm.txt:

dirty_writeback_centisecs

The kernel flusher threads will periodically wake up and write `old' data
out to disk.  This tunable expresses the interval between those wakeups, in
100'ths of a second.

Setting this to zero disables periodic writeback altogether.

BW jobs which end sooner that 5s from checkpoint do not write to disk any data. Long running processes write files using Linux IO scheduler optimizations (deadline, cfq). To summarize: BW file checkpoint on Linux is not a serious performance problem when used wisely.

0 komentarze: