Wednesday, March 26, 2014

F5 Big-IP Load Balanced WCF Services

*update* - This follow up may provide more detail on finding a solution 

We have been trying to configure our new F5 Big-IP load balancer for some WCF services and encountered a strange issue. 
The service uses wsHttpBinding with TransportWithMessageCredential over SSL.  The Auth type is Windows.
When disabling either node on the F5, the service worked.  However when both nodes were active, the service failed with an exception:
Exception:
Secure channel cannot be opened because security negotiation with the remote endpoint has failed. This may be due to absent or incorrectly specified EndpointIdentity in the EndpointAddress used to create the channel. Please verify the EndpointIdentity specified or implied by the EndpointAddress correctly identifies the remote endpoint.
Inner Exception:
The request for security token has invalid or malformed elements.
Numerous documentation and blogs highlight that *the* way to support load balancing in WCF is to turn off Security Context Establishment by setting EstablishSecurityContext=false in the binding configuration, or by turning on 'sticky sessions'.
http://ozkary.blogspot.com.au/2010/10/wcf-secure-channel-cannot-be-opened.html
http://msdn.microsoft.com/en-us/library/ms730128.aspx
http://msdn.microsoft.com/en-us/library/vstudio/hh273122(v=vs.100).aspx
We did not want to use sticky sessions; although this did fix the issue the F5 logs showed that load balancing was not working as we wanted.
Unfortunately, we already had EstablishSecurityContext set to false,  so the security negotiation should have been occuring on each request, which meant it should be working. 
After hours of investigating other binding settings, creating test clients, updating the WCFTestTool configurations and generally fumbling around, eventually we went back to reconfiguring the F5.  Although it worked when only one node was active with exactly the same configuration, unless the WCF binding documentation we found *everywhere* was a complete lie it had to be F5.
It was finally traced to the F5 OneConnect (http://support.f5.com/kb/en-us/solutions/public/7000/200/sol7208.html) configuration.  This does some jiggery-pokery to magically pool backend connections to improve performance.  It also seems to break WCF services, at least it broke ours. 
Disabling OneConnect on the F5 application profile resolved the issue immediately.
We now have our load balanced, non-persistent, WCF services behind the shiny F5 working.
As this hadn't come up online that I could find, I can only assume it was related to the combination of TransportWithMessageCredential and Windows as the message credential type.
*edit* So the solution was not as straight forward as we thought, and currently we have reverted back to “sticky sessions” in the F5 to get this to work.  Even with EstablishSecurityContext=false, and OneConnect disabled, the same failure will occur if a single client has two concurrent requests using two separate threads (our clients are web applications) and the F5 routes each connection to a separate service node.

While we investigate further, the short term solution is to use Transport security instead of TransportWithMessageCredential.  As this required a client change, we had to deploy multiple bindings and each client app will upgrade while we use Sticky Sessions on the F5.  Once all clients are on the new binding we can remove the old binding and disable sticky sessions again.

Transport security works for us, but it is not perfect.  It reduces security (SSL only, no message encryption) and reduces flexibility (we can’t for instance switch to client certificate, or username authentication for individual per request auth).
It does however keep the services stable and gives us time to perform a thorough analysis of the problem.

2 comments:

  1. Hi.
    Im facing the exact same problem as you describes. I havea Big-IP as load balancer and I have a couple of services failing with same error as you describe. The WCF services uses the recommended settings establishSecurityContext="false" without any effect. Did you ever find any solution to WsBinding with message security and Big-IP ?

    ReplyDelete
    Replies
    1. No, unfortunately I haven't had a verified solution, however I have done some further investigation and encountered something similar - the problem is that one-shot Kerberos wasn't set up correctly.

      Ensure you have correct SPNs
      Ensure you configure the establishSecurityContext and negotiateServiceCredentials both to false
      Ensure you set up the element of the service configuration and the of the client - I have tested this with certificate identity, BUT it should work if you use the correct SPN account (this is the bit I haven't confirmed).

      I have a bit of a lengthy post that I should probably put up that might help guide you.

      Delete