Why clients need to retransmit Confirms

The following shows problems caused by a client not retransmitting Confirm feature-negotiation options while in state PARTOPEN.

1. Connection aborted due to Ack lost in the network

The early implementation of feature negotiation did not retransmit Confirm feature-negotiation options while in state PARTOPEN. This lead to aborting the connection when the single Ack confirming the DCCP-Response got lost, triggering a "feature negotiation failed" system log message at the server, such as seen in the bug report.

To illustrate the problem, consider the following message exchange that happened with the early implementation.

The Ack confirming the Response was dropped

The sequence of events is
The corresponding capture files to illustrate the problem can be found here and here.

2. Requirements and a more robust solution

Aborting the connection setup when feature negotiation fails is essential to ensure that both endpoints are in a sane state before entering the data transmission phase.

Clients stay in PARTOPEN until they can be sure that the server has received the acknowledgment of their Response packet. This happens when the client "receives a valid packet other than DCCP-Response, DCCP-Reset, or DCCP-Sync from the server" (RFC 4340, 8.1).

Thus, although the RFC does not mandate to retransmit Confirm options, for the client it is necessary to retransmit them as long as it stays in PARTOPEN, since the initial Ack, any retransmitted Acks or newly sent DataAcks can get lost in the network.

3. Proof of concept

After this issue had been reported, the implementation was changed to let the client retransmit Confirm options while in state PARTOPEN. This is achieved by flushing the client feature-negotiation queue only at the moment when transitioning to OPEN.

To test the implementation, a test-client was modified to drop the acknowledgment following the Response, forcing it to retransmit the Confirm options on the subsequent DataAcks. This is shown in the screenshot below, the corresponding capture file is here.

Feature negotiation now robust against dropped Acks

The client starts with Request #069 and drops Ack #70. At DataAck #73 it is still in PARTOPEN, so it retransmit the Confirm options shown in the screenshot. The server then acknowledged #073, after entering itself OPEN state as a result of receiving the requisite Confirm options. The Ack #058 by the server is actually a cumulative acknowledgment, since it contains a CCID-2 Ack Vector (check the capture file).

The important point is that the communication continues successfully in spite of losing the initial Ack, which was not possible with the earlier implementation.

4. And yet another problem to solve

Unfortunately there is another problem. Feature negotiation options take more space than usual options, in part due to the server-priority list dynamic-length format. DCCP allows clients to send DataAck packets in state PARTOPEN. So we can either
The amount of feature negotiation options varies usually between 20-70 bytes. To reduce the MPS by always 72 bytes (multiples of 4 byte) is taking too much away from the normal payload.

Hence a special case was devised: clients in PARTOPEN retransmit the Confirm options by sending an extra Ack before sending DataAcks with large payloads. This means that the client has replied with a total of two Acks to the DCCP-Response. If both get lost in the network then the user needs to try again to connect.

The special-case is demonstrated in the following screenshot, the capture file shows more details.

Sending an extra (second) Ack to save a connection that uses large payloads

The first Ack carrying Confirm options is #412, answering Response #428. Until the server-Ack #429 arrives (packet 7), the client remains in PARTOPEN. The DataAck #414 has too little room for the 20 bytes of feature-negotiation options (due to a payload size of 1420 bytes), hence the client sends the second Ack, #413, to carry the Confirm options.

The Ack #429 by the server is a cumulative acknowledgment with an Ack Vector of run length 2, i.e. down to #413. Since Ack Vectors are relative only to OPEN state, this shows that the server entered OPEN state as expected when receiving Ack #412.

So this second Ack saved the day for the connection. Had it not been used, packet 5 would have been dropped due to over-length. In CCID-3 this causes ugly throughput reduction due to packet loss.

On the other hand, if Ack #412 had been lost, the second Ack #413 would have allowed the server to enter OPEN and carry on as if nothing had happened. If the second Ack were lost too, it may be time to get a cup of coffee and try to dial in again...