środa, września 26, 2012

BW XA examined


To the queue test.fw1 we sent 128 messages with sequence numer as a body:
2012 Sep 26 10:11:55:974 GMT +2 BW.xa-test User [BW-User] - Job-10000 [testcase/trigger.process/Group/Log]: SENT ID:EMS-SERVER.CB75062B89D4:1 SEQ 1
2012 Sep 26 10:11:55:990 GMT +2 BW.xa-test User [BW-User] - Job-10000 [testcase/trigger.process/Group/Log]: SENT ID:EMS-SERVER.CB75062B89D4:2 SEQ 2
...

Then we've been moving them back nad forth, with rewriting preceding JMSMessageID (PREV) as Body of new message, using queues test.bw1+test.fw1:
2012 Sep 26 10:15:37:884 GMT +2 BW.xa-test User [BW-User] - Job-11023 [testcase/BW.process/Log]: FROM ID:EMS-SERVER.CB75062B89D26:135 TO ID:EMS-SERVER.CB75062B89D25:147 PREV 5
2012 Sep 26 10:15:37:884 GMT +2 BW.xa-test User [BW-User] - Job-11017 [testcase/FW.process/Log]: FROM ID:EMS-SERVER.CB75062B89D4:18 TO ID:EMS-SERVER.CB75062B89D24:146 PREV 18

During this moving active EMS server was killed.

user@WAR-LAP-510 /cygdrive/c/Users/user/Desktop
$ cat log_recv.txt | awk '{print $16}' | sort | wc -l
143

user@WAR-LAP-510 /cygdrive/c/Users/user/Desktop
$ cat log_recv.txt | awk '{print $16}' | sort | grep -v NaN | wc -l
120

why JMS body is NaN?

user@WAR-LAP-510 /cygdrive/c/Users/user/Desktop
$ cat log_recv.txt | awk '{print $16}' | sort -u | grep -v NaN | wc -l
111

duplicate ID:EMS-SERVER.CB75062B89D24:416
duplicate ID:EMS-SERVER.CB75062B89D25:406
duplicate ID:EMS-SERVER.CB75062B89D25:414

Let's see ID:EMS-SERVER.CB75062B89D24:416:

2012 Sep 26 10:15:44:354 GMT +2 BW.xa-test User [BW-User] BW-XATM-100005 Job-11316 [testcase/BW.process/Log-1]: ERR FROM ID:EMS-SERVER.CB75062B89D24:416 TO ID:EMS-SERVER.CB75062B89D26:443 PREV NaN
2012 Sep 26 10:15:44:775 GMT +2 BW.xa-test User [BW-User] - Job-11343 [testcase/BW.process/Log]: FROM ID:EMS-SERVER.CB75062B89D24:416 TO ID:EMS-SERVER.CB75062B89D23:468 PREV NaN
2012 Sep 26 10:15:45:009 GMT +2 BW.xa-test User [BW-User] - Job-11354 [testcase/BW.process/Log]: FROM ID:EMS-SERVER.CB75062B89D24:416 TO ID:EMS-SERVER.CB75062B89D20:484 PREV NaN
2012 Sep 26 10:19:28:633 GMT +2 BW.xa-test User [BW-User] - Job-12085 [testcase/cleanQueuesByConsuming/consumeFW.process/Log]: CURRENT ID:EMS-SERVER.CB75062B89D23:468 PREV ID:EMS-SERVER.CB75062B89D24:416
2012 Sep 26 10:19:28:773 GMT +2 BW.xa-test User [BW-User] - Job-12091 [testcase/cleanQueuesByConsuming/consumeFW.process/Log]: CURRENT ID:EMS-SERVER.CB75062B89D20:484 PREV ID:EMS-SERVER.CB75062B89D24:416

The same message ID:EMS-SERVER.CB75062B89D24:416 is delivered 3 times, twice in sucessful process!
Timestamps suggest that duplication is due to failed transaction.

ERR FROM ID:EMS-SERVER.CB75062B89D24:416 TO ID:EMS-SERVER.CB75062B89D26:443 PREV NaN

NaN in body may indicate that message was received as damaged due to s11601.dc server kill, then new was sent to s11660.dc2.
Next we see try to commit transaction using s11601.dc which is dead - not available XAER_NOTA:
2012 wrz 26 10:15:44:354 CEST BW.xa-test Warn [com.arjuna.ats.internal.jta.resources.arjunacore.rollbackxaerror] [com.arjuna.ats.internal.jta.resources.arjunacore.rollbackxaerror] XAResourceRecord.rollback - xa error XAException.XAER_NOTA

Handling of XA is not aware about failover of active-passive of EMS connection, and as a result we see that messages are lost or duplicated.

https://access.redhat.com/knowledge/docs/en-US/JBoss_Enterprise_Web_Platform/5/html/Transactions_JTA_Development_Guide/chap-Transactions_JTA_Programmers_Guide-Transaction_Recovery.html

"Because XAResource objects are not persistent across system failures, the Transaction Manager needs the ability to acquire the XAResource objects that represent the resource managers which might have participated in the transactions prior to a system failure. For example, a Transaction Manager might use the JNDI look-up mechanism to acquire a connection from each of the transactional resource factories, and then obtain the corresponding XAResource object for each connection. The Transaction Manager then invokes the XAResource.recover method to ask each resource manager to return the transactions that are currently in a prepared or heuristically completed state."

"When running XA recovery, you must tell JBoss Transaction Service which types of Xid it can recover. Each Xid that JBoss Transaction Service creates has a unique node identifier encoded within it, and JBoss Transaction Service only recovers transactions and states that match the requested node identifier. The node identifier to use should be provided to JBoss Transaction Service in a property that starts with the name com.arjuna.ats.jta.xaRecoveryNode. Multiple values are allowed. A value of * forces recovery, and possibly rollback, of all transactions, regardless of their node identifier. Use it with caution. "

Xid contains JMS connection ID, however Xid created on first EMS server can be accepted by second EMS server because information about connections is persisted to disk. During recovery phase Xid list is not rebuilt properly on second server therefore lack of transactions transition is observed.

0 komentarze: