czwartek, września 27, 2012

Duplicates with Tibco EMS XA in FT mode

2012-09-27 08:47:47.733 [MSG:252] sent to user='anonymous': connID=5 consID=7 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
FT switch
2012-09-27 08:47:49.330 [MSG:252] sent to user='anonymous': connID=5 consID=3 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:49.377 [MSG:252] sent to user='anonymous': connID=5 consID=2 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:49.440 [MSG:252] sent to user='anonymous': connID=5 consID=7 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:49.455 [MSG:252] sent to user='anonymous': connID=5 consID=9 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:49.564 [MSG:252] sent to user='anonymous': connID=5 consID=8 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:49.814 [MSG:252] sent to user='anonymous': connID=5 consID=11 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:49.876 [MSG:252] sent to user='anonymous': connID=5 consID=2 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'
2012-09-27 08:47:50.126 [MSG:252] acknowledged by user='anonymous': connID=5 consID=2 msgID='ID:EMS-SERVER.15905063F50823:111' queue='test.bw1'



After FT switch the same message is delivered to different EMS sessions and transactions are commited. Redelivery is by design (Shared State Failover chapter):

'For queue receivers, any messages that have been sent to receivers, but have not
been acknowledged before the failover, may be sent to other receivers
immediately after the failover.
A receiver trying to acknowledge a message after a failover may receive the
javax.jms.IllegalStateException. This exception signifies that the attempted
acknowledgement is for a message that has already been sent to another queue
receiver. This exception only occurs in this scenario, or when the session or
connection have been closed.',


but not throwing an error during transaction commit is a bug in EMS server BW.
Atrribute exclusive stick to the queue can prevent this disaster (and performance), but pending transactions can still affect overal message number consistency (workaround: all prepared transactions should be commited, rollback is harmful).

Update 28.09.2012: Tibco Support is not very helpful so I'm digging deeper.

I assume xa_end should be just after JMS Send, so from xa_end I will get XID, and then see what's going on with transaction (I correlate xa_end using conn+sess):

2012-09-27 08:47:49.892 [735917889 JobCourier2] [TIBCO EMS]: [J] xa start conn=6 sess=35 xid=< 131075, 29, 27, 494545535110110010248561005899514857585348545110254535458525457455351101100102485610058995148575853485451102545354585210257 > ({formatID=131075 gtrid_length=29 bqual_length=27 data=1--53edf08d:c309:5063f656:469-53edf08d:c309:5063f656:4f9}) flags=0
2012-09-27 08:47:49.892 [735917889 JobCourier2] [TIBCO EMS]: [J] QueueSender Send conn=6 sess=35 prod=8 dest=test.fw1 msgid=ID:EMS-SERVER.15905063F50823:159 dlvmode=2 pri=4 ttl=0
2012-09-27 08:47:49.892 [735917889 JobCourier2] [TIBCO EMS]: [J] xa end conn=6 sess=35 xid=< 131075, 29, 27, 494545535110110010248561005899514857585348545110254535458525457455351101100102485610058995148575853485451102545354585210257 > ({formatID=131075 gtrid_length=29 bqual_length=27 data=1--53edf08d:c309:5063f656:469-53edf08d:c309:5063f656:4f9}) flags=4000000
735917889 JobCourier2] [TIBCO EMS]: [J] xa commit conn=6 xid=< 131075, 29, 27, 494545535110110010248561005899514857585348545110254535458525457455351101100102485610058995148575853485451102545354585210257 > ({formatID=131075 gtrid_length=29 bqual_length=27 data=1--53edf08d:c309:5063f656:469-53edf08d:c309:5063f656:4f9}) onephase=false
2012-09-27 08:47:50.126

Wow. XID doesn't enclose JMS Receive. So we've got transaction only for sending and it is commited successfully. That's the reason we've got duplicates.


Why JMS Receive is not enclosed in transaction? Because it was enlisted once in transaction which wasn't commited successfully and its not visible to other transactions (and therefore observed transactions only for sending). Message is acked after being redelivered to session which it originated from. So this is BW bug.

Update 01.10.2012: There is a bug in com.tibco.plugin.share.jms.impl.JMSReceiver.SessionController - flag requesting new transaction is not cleared in error handling.

0 komentarze: