Thursday, September 8, 2016

F5 Big-IP Load Balanced WCF Services - Update


The post below was some findings from a project related to the authentication between the front and back end services, and F5 configurations.  It closely matches issues identified in my previous post for the SSO server but that was using ntlm/kerberos (I have a feeling that we didn't set up the SPNs for kerberos correctly).  Ultimately I bet that a similar configuration to that below but set up for one-shot-kerberos and using the SPN in the Identity column would have resolved this issue.



tldr;  
Server config:  
Configure the server to accept client certificates using PeerTrust - allows us to authorise the client 
Define the server certificate (the identity of the service - important later on) - allows message security to work 
<behavior name="certificate"> 
<serviceMetadata httpGetEnabled="true" httpsGetEnabled="true" /> 
<serviceDebug includeExceptionDetailInFaults="true" /> 
<serviceCredentials> 
<clientCertificate> 
<authentication certificateValidationMode="PeerTrust" /> 
</clientCertificate> 
<serviceCertificate findValue="<blah>" storeLocation="LocalMachine" x509FindType="FindBySubjectName" /> 
</serviceCredentials> 
</behavior> 

configure the wsHttpBindings to use Message security,  
items highlighted yellow - this is used for the message security and allows load balancing without sticky sessions 
items highlighted green - this is used to authenticate the client using a certificate 
<wsHttpBinding> 
<binding name="certauth" maxReceivedMessageSize="2147483647"> 
<readerQuotas maxStringContentLength="2147483647" maxArrayLength="2147483647" /> 
<security mode="Message"> 
<message algorithmSuite="TripleDesSha256" establishSecurityContext="false" negotiateServiceCredential="false" clientCredentialType="Certificate" /> 
</security> 
</binding> 
</wsHttpBinding> 

Configure the endpoint, basically accept request on <endpoint>/certauth and applies the appropriate message security and authentication 
<endpoint name="certauth" address="certauth" binding="wsHttpBinding" bindingConfiguration="certauth" contract="<contract>" /> 


Client Config: 
Configure the client to accept the server's security configuration using PeerTrust 
Configure the certificate that this client will pass as its credentials 
<endpointBehaviors> 
<behavior name="certificate"> 
<clientCredentials> 
<clientCertificate findValue="<name>" storeLocation="LocalMachine" x509FindType="FindBySubjectName" /> 
<serviceCertificate> 
<authentication certificateValidationMode="PeerTrust" /> 
</serviceCertificate> 
</clientCredentials> 
</behavior> 
</endpointBehaviors> 

Configure the security bindings, must match the service bindings 
<wsHttpBinding> 
<binding name="certauth" maxReceivedMessageSize="2147483647"> 
<readerQuotas maxStringContentLength="2147483647" maxArrayLength="2147483647" /> 
<security mode="Message"> 
<message algorithmSuite="TripleDesSha256" establishSecurityContext="false" negotiateServiceCredential="false" clientCredentialType="Certificate" /> 
</security> 
</binding> 
</wsHttpBinding> 

Configure the endpoint.  Note the /certauth suffix. 
the identity element is essential, it needs to match the serviceCredentials defined on the service.   
<endpoint address="<endpoint>/certauth" binding="wsHttpBinding" bindingConfiguration="certauth" contract="<contract>" behaviorConfiguration="certificate"> 
<identity> 
<certificateReference findValue="<name>" x509FindType="FindBySubjectName" storeLocation="LocalMachine" storeName="TrustedPeople"/> 
</identity> 
</endpoint> 



long version; 

To do this we need to use message security for the following reasons 
  • We are using SSL offload, so transport security is not supported (works over SSL only) 
  • We are using certificate auth, which is not allowed on TransportWithCredentialOnly 
  • We are working across a domain boundary so we cannot use windows authentication 

The second complexity is that we are working in a load balanced environment.  This adds complexity as WCF negotiates a security context to generate the message security before sending the message.  If the Load balancer sends the message to a different server to the one the security context was generated on, then this will cause intermittent security failures. 

The two general solutions to this are to; 
  • enable sticky sessions which the F5 then "guarantees" that the security context and message occur on the same server.   
    • This did not seem to work - with sticky sessions enabled with the following config (), we still received this errors.   
    • If the security context lasted longer than the sticky session timeout, then this could still cause problems, but I don't think this was the case. 
    • I have a feeling this was caused by not setting the 'identity' correctly in the WCF config which was a key part of the final working solution, but it should still have worked regardless as both front end servers had success connecting to both back end servers. 
  • set wcf establishSecurityContext="false" which basically sets up security for every message, rather than establishing a long term security context, each message would have its own security context applied. 
    • this did not work.  I have encountered this before where this was the supposed fix and did not actually resolve the issue, but I thought that it was related to ntlm issues at the time. 
    • Sticky sessions and this fix should pretty much have guaranteed the fix as you would never have the sticky session timeout before the security context expired (because it is only ever generated right before the message is sent). 
      • This did not work.  I have no explanation for this. 

A final solution that I had previously read which allows a configuration to work without sticky sessions is to set negotiateServiceCredentials="false", however all examples point this to enable "kerberos-one-shot" which allows message security using kerberos if the services are configured with domain users and SPNs.  As we work across domains this wasn't acceptable. 

I was finally able to put a few things together and determined that the use of "establishSecurityContext" along with "negotiateServiceCredentials" should be possible without kerberos if you use certificate security (separate from certificate authentication).  This had partially been done with the  "servicebehaviour" element which defined the serviceCredentials on the server using certificates, and peerTrust for serviceCertificate on the client.  However the next step was to define the serviceIdentity on the client config so that the client knew which certificate to encrypt the message with ( public key) instead of negotiating with the service to determine its certificate before encrypting.   
Setting the Endpoint Identity and setting establishSecurityContext="false" and negotiateServiceCredentials="false" meant that the client now had enough info to encrypt the message without asking the service anything, and the service could decrypt it. 

Note that NONE of these issue related to the authentication certificate not being accepted, it was the "message security" that was the problem, it just happened to require message security so that it could attach the authentication to the payload. 

A side effect of this is that we should now be able to get rid of sticky sessions. 

The Dependency Zoo

This is shamelessly pulled from here: https://nblumhardt.com/2010/01/the-relationship-zoo/ - read it, it is good.  Much better than this.

Quite some time ago I was faced with a problem that I wasn't quite sure how to solve, and one of the dev's I most respect sent me the link above.  While the link wasn't the answer, it helped me formulate the answer I needed by making me think more critically about my dependencies.

Prior to that point dependency injection to me was very useful, but I didn't really how limited my understanding was.  I thought I had a good grasp of why IoC was important, and how to use DI, but I hadn't realised how simplistic my view was.

The problem in my case wasn't that I needed an B, but I needed a specific B and I would only know which B until runtime.  This didn't match the way I thought of Dependency Injection and I couldn't figure out a non-ugly solution.

What this article did was really make me think about how I consider dependencies. 

The article notes that there are a number of common types of relationships between classes:
"I need a B"
"I might need a B"
"I need all the Bs"
"I need to create a B"
"I need to know something about B before I will use it"

All of these have a direct representation in Autofac (and many other DI containers, though the last is not so common), this was a direct result of the author thinking about these relationships before writing Autofac.

Usually the solution is "use the service locator 'anti-pattern'".  I won't argue the point, but it is widely considered an anti-pattern unless it fits a very narrow category.  This does fall within that category, but I didn't think it was an elegant solution to my problem.  I also didn't think of it at the time.  Essentially the service locator pattern is you take the DI container as the dependency and ask it for what you need (usually in a switch statement).




The second most common solution to my problem is to take the "I need to create a B", or Func<X,Y,B> where B is the type we want and X and Y are the parameters that we use to create a B.  This pretty much matches the Factory pattern, where you have a class which creates the objects you want.  In this case autofac is smart enough that if you take a dependency on Func<X,B> and you have an B that takes X as a parameter, it will give you the constructor function of X without having to write a specific "Factory" class. 


The problem in this instance is that it doesn't really solve my problem for two reasons, 1 is that the type of A was common across all my B's, so Autofac wouldn't know which B to give me; I still needed a Factory class.  It does makes my dependee cleaner, but the factory was still riddled with switch statements trying to get the right B.  It is a very useful type of dependency, but not the one I wanted in this instance.

The next most common solution is to take all of the Bs and loop through them asking them if they are the right one.  I've seen this as the "right way to not use a service locator for that one instance where it is usually used" but I'm not particularly convinced. http://blog.ploeh.dk/2011/09/19/MessageDispatchingwithoutServiceLocation/ explains this is a bit more detail.  It works, but it is really inefficient - especially if X is expensive to create.  It is also ugly.

"Expensive to create" might not always be an issue, but if you are creating every B and discarding all but one, every time you ask for an B, that's messy.  This is where Lazy<B> comes in.  Lazy is built into C# (and autofac knows about it) so that you don't actually get a B, you get something that can give you an B when you need it.


This means that if you *might* need an B, then go with Lazy<B> and you will only create/get one when you need it.  It's not always necessary, but if you have a complex dependency hierarchy, or a very intensive constructor, it can make a big difference.  Unfortunately I still needed all of the B's to determine which one I needed, and because I had Lazy<B>'s I couldn't ask it if this was the right one without type inspection/reflection.  Note: be careful of the distinction between Lazy<B> and Owned<B> - I won't cover it here, but it is important. 

So far we have covered the top 4 in the list.  They are all important, very useful to know, but they didn't solve my problem.  The last on the list is the most obscure, and possibly the least widely used, but it is something I love (and have manually implemented in other DI containers that doesn't have it).
Autofac calls this Meta<B,X> where X is the type of an object that is attached to B at the time you register B.  You can then ask the dependency about its X, without creating a B.
Using this we can have lots of B's, and when registering my B's I tell it to attach an X to it.  When I want the right B, I ask it for the IEnumerable<Meta<X,Lazy<B>>> and loop through the list until I find the B that has the X I am looking for.  Autofac does this automatically, all I need to do is tell autofac the X when I register each B.



I can also create a Func<B> factory that does this, cleaning up my dependee a little more - the most common use case for this pattern is the command dispatcher pattern where it doesn't really add any value to abstract this however (since finding the right B for a given ICommand is the whole point of the command dispatcher). 
Some examples of the Metadata you want to attach to a method are:
  • The type of ICommand the IHandler handles - this is the one I use the most
  • A 'weight' or 'order'  - if we have 10 B's that might be the one I need, I get the highest priority one.
  • A 'delegation' pattern, I need to get the approval workflow for invoices > $500.  If this changes to >$1000 in 6 months all I need to do is update the registration of the IApprovalWorkflows to set the new limit.
  • A 'feature toggle' or A/B testing - you may define a number of Xs and want to pick a different one based on some criteria.


I don’t expect anyone to come out of this thinking that these examples are the best way to do things, but what I would like you to come away with is a greater appreciation of what it means to have a dependency, and to use appropriate patterns and abstractions instead of just saying "I need a B" ( or new-ing something up). 



Tuesday, July 22, 2014

The promise of functional utopia (dippng my toes in F#)

Functional programming has been around for ages, but aside from a very brief introduction in my uni days I hadn't even considered using a functional language at work. Times are changing however, and functional programming is gaining a lot more mainstream attention. The key cited benefits are reduced bugs due to fewer side effects in code, better concurrency/scalability support due to the use of immutable types, and greater productivity due to the use of higher order functions and reduced "lines of code".

With that in mind I set out to gain some experience in functional programming and decide whether I could see the cited benefits and make use of it.

As a .NET developer I have focused on F# - advice from some experts has indicated that learning a stricter functional language like scala (scalaz) or haskell is recommended as F# allows for some 'cheating' to occur. I won't go into the details because it is beyond my level of understanding, but sticking to "pure" functional languages means you are less likely to fall into the traditional coding traps you might encounter. While I decided to stick with F#, I have tried to keep everything as strict and side-effect free as possible, using record types over classes and avoiding mutable types.


To start with, I recommend taking a look at a couple of Pluralsight classes if you have the opportunity, these are pretty useful. The basic introduction is useful to get an idea of the language features, and the intermediate level class is extremely useful for understanding how functional programming can be used in the real world.


The key outcomes I took from my initial look into F# is that it reminded me a bit of my early days of getting to know JavaScript; everything was a hodgepodge of dependencies and big unstructured files that were difficult to read. I am pretty sure there are better ways to structure things, so I am not overly concerned by that, but I do think it certainly leads you down an unstructured path and I would want to understand how to structure things better before trying to use it on anything "real".

On the positive side, I think the power of the language itself for data manipulation became apparent quickly. The sequence and list features are extremely powerful and really form the basis of anything you would want to do with the data at hand.

For standard "CRUD" systems I wouldn't expect to see a significant gain in using functional programming, but I can certainly see a benefit for data analysis and transformation activities. It is also fairly easy to combine F# with C# code, so you can use imperative code and libraries to do the imperative CRUD code (such as database querying) and then F# to work with the resulting data.

All in all, I think there is work involved in making F# code "enterprise ready" from a maintainability and supportability perspective. From examples I have seem, I believe this is likely an issue of functional languages in general, and not F# specific, but I don't have any experience to make that call. I would find it hard to justify the learning curve to a client, but I certainly think there are benefits to be had.

Friday, April 11, 2014

How not to write unit tests

In doing some major refactoring of code recently I was extremely glad we had unit test coverage in place.  Unfortunately there were some fairly problematic issues in the implementation of the unit tests that ended up costing a lot more time than it should.

The problem was two-fold.  Firstly there was a mismatch of custom hand-crafted stub classes and automatically mocked interfaces.  Secondly there was no consistency over the level of dependencies that were stubbed.  In some cases "true" (i.e. non-stubbed) implementations of dependencies were used three or four dependencies deep, and others only the direct dependency was mocked.

The combination of this meant it was really difficult to update the unit tests as part of the refactoring.  Each time a test failed I had to check whether it was due to a manually crafted stub that needed to be revised, a top level dependency that needed to be updated, or a dependency deep in the dependency tree that needed to be updated.

If the unit tests were configured appropriately, the only time I would need to update the unit tests would have been when a direct dependency changed.

What should have been done?

One of the more important features of Unit Tests is isolation and repeatable consistency of testing.  This ensures your unit test is appropriately testing the code you are targeting, and not the multitude of dependencies that the code may have.

If a direct dependent class has a nested dependency that is modified but the direct dependency retains the same expected behaviour, then you don't need to change your unit test.  Without this separation, major refactoring of your code becomes far more labourious, as you need to both refactor your code as well as all of the unit tests that have a dependency on the altered code at any level.

For this reason mocking or stubbing the direct dependencies of a method has become standard practice when working with unit tests.  Back in the old days this used to mean creating custom stub classes for all your dependencies.  This is one reason why Unit Testing was so often implemented as full end-to-end tests.
These days there is no excuse, Tools like Moq and RhinoMocks provide the ability to automatically create mocked versions of your dependencies, return appropriate objects based on the the calling parameters, and confirm that the services were called with the expected parameters.  Using these tools allows you to focus on testing the "unit of code" without worrying about what dependencies of dependencies might be doing.

Given an arbitrary method that calculates the cart total cost with a discount based on the total cost and number of items purchased.

        public decimal CalculateCart(Item[] items, IDiscountCalculator discountCalculator)
        {
            decimal total = 0;

            foreach (var item in items)
            {
                total += item.Cost;
            }

            var discountedAmount = discountCalculator.calculate(total, items.Length);

            return total - discountedAmount;
        }

IDiscountCalculator has its own unit tests to verify its behaviour, so we don't want to be testing this as part of the cart calculation.  We also don't want our CalculateCart to fail if our discountCalculator algorithm changes - as long as we trust the discountCalculator is going to work we don't need it to do the actual calculation and can return an arbitrary value.

public void TestCalculateCart()
{
    var cart = new Cart();
    Mock<IDiscountCalculator> discountCalculator = new Mock<IDiscountCalculator>();
           
    //arrange
    var expectedDiscount = (decimal)2;
    var expectedOriginalTotal = 10 + 10;
    var expectedDiscountedTotal = 10 + 10 - (decimal)2;

    var items = new Item[] {new Item() {Cost = 10}, new Item() {Cost = 10}};
    discountCalculator
        .Setup(x =>
            x.calculate(It.IsAny<decimal>(), It.IsAny<int>())
        )
        .Returns(expectedDiscount);

    //act
    var total = cart.CalculateCart(items, discountCalculator.Object);

    //assert
    //confirm the expected total is returned
    Assert.AreEqual(expectedDiscountedTotal, total );
           
    //confirm the discount calculator was called with the appropriate input
    discountCalculator.Verify(mock =>
        mock.calculate(It.Is<decimal>(x => x.Equals(expectedOriginalTotal)), It.Is<int>(x => x.Equals(2)))
        , Times.Once
        );

}

Now no matter what the implementation of DiscountCalculator, as long as we trust our DiscountCalculator is behaving (i.e through valid unit tests) then we don't need to change our unit test for the Cart calculation.

If our cart calculator was extended to throw an InvalidDiscountException in the event the discount is great than the total, we can arrange the discountCalculator to return 12, and assert that the exception was thrown without changing our 'happy path test' and with the same level of confidence.

Proper isolation and repeatability means refactoring is far simpler and your tests become less brittle.

Wednesday, March 26, 2014

F5 Big-IP Load Balanced WCF Services

*update* - This follow up may provide more detail on finding a solution 

We have been trying to configure our new F5 Big-IP load balancer for some WCF services and encountered a strange issue. 
The service uses wsHttpBinding with TransportWithMessageCredential over SSL.  The Auth type is Windows.
When disabling either node on the F5, the service worked.  However when both nodes were active, the service failed with an exception:
Exception:
Secure channel cannot be opened because security negotiation with the remote endpoint has failed. This may be due to absent or incorrectly specified EndpointIdentity in the EndpointAddress used to create the channel. Please verify the EndpointIdentity specified or implied by the EndpointAddress correctly identifies the remote endpoint.
Inner Exception:
The request for security token has invalid or malformed elements.
Numerous documentation and blogs highlight that *the* way to support load balancing in WCF is to turn off Security Context Establishment by setting EstablishSecurityContext=false in the binding configuration, or by turning on 'sticky sessions'.
http://ozkary.blogspot.com.au/2010/10/wcf-secure-channel-cannot-be-opened.html
http://msdn.microsoft.com/en-us/library/ms730128.aspx
http://msdn.microsoft.com/en-us/library/vstudio/hh273122(v=vs.100).aspx
We did not want to use sticky sessions; although this did fix the issue the F5 logs showed that load balancing was not working as we wanted.
Unfortunately, we already had EstablishSecurityContext set to false,  so the security negotiation should have been occuring on each request, which meant it should be working. 
After hours of investigating other binding settings, creating test clients, updating the WCFTestTool configurations and generally fumbling around, eventually we went back to reconfiguring the F5.  Although it worked when only one node was active with exactly the same configuration, unless the WCF binding documentation we found *everywhere* was a complete lie it had to be F5.
It was finally traced to the F5 OneConnect (http://support.f5.com/kb/en-us/solutions/public/7000/200/sol7208.html) configuration.  This does some jiggery-pokery to magically pool backend connections to improve performance.  It also seems to break WCF services, at least it broke ours. 
Disabling OneConnect on the F5 application profile resolved the issue immediately.
We now have our load balanced, non-persistent, WCF services behind the shiny F5 working.
As this hadn't come up online that I could find, I can only assume it was related to the combination of TransportWithMessageCredential and Windows as the message credential type.
*edit* So the solution was not as straight forward as we thought, and currently we have reverted back to “sticky sessions” in the F5 to get this to work.  Even with EstablishSecurityContext=false, and OneConnect disabled, the same failure will occur if a single client has two concurrent requests using two separate threads (our clients are web applications) and the F5 routes each connection to a separate service node.

While we investigate further, the short term solution is to use Transport security instead of TransportWithMessageCredential.  As this required a client change, we had to deploy multiple bindings and each client app will upgrade while we use Sticky Sessions on the F5.  Once all clients are on the new binding we can remove the old binding and disable sticky sessions again.

Transport security works for us, but it is not perfect.  It reduces security (SSL only, no message encryption) and reduces flexibility (we can’t for instance switch to client certificate, or username authentication for individual per request auth).
It does however keep the services stable and gives us time to perform a thorough analysis of the problem.

Tuesday, March 25, 2014

“Don’t Query From the View” - Unit of Work and AOP

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.


I was asked about a warning from NHProfiler from a colleague to see my opinion on the matter.  The warning was:  http://hibernatingrhinos.com/products/NHProf/learn/alert/QueriesFromViews “Don’t Query From the View”.  

The team had a session lifetime per controller action using AOP (Attribute Filters), but needed to extend this out to be active during the view rendering phase after the controller completed.  They did this by instantiating the session in the request filter chain instead of the controller attribute filters.  When they did this, they received the NHProfiler warning above.

The collegue was dismissive of the warning, partly because the implementation of the code was innocuous and partly because the warning reasons were not particularly compelling.  

There are some serious implications of the pattern that was being followed however, some of which I have covered on this blog in the past (here) and (here).

 
tldr;
·        Sessions can be open as long as you need them to be.  You do not need to arbitrarily close them if you know they will be needed again as soon as the next step in the pipeline is reached.
·        There are (arguably) valid reasons why the View rendering would be able to access an active session.
·        The longer a session is open, the more chance of unexpected behaviour from working with Entities in a connected state.  Care should be taken.


Pro1: Simplicity
One benefit from having a session opened and closed automatically using cross-cutting AOP is that your application doesn’t need to care about it.  It knows that it will have a session when used, and will commit everything when the scope ends.  This is often done as a controller/action filter attribute, or higher in the request pipeline.  You don’t need to pass a session around, check if it exists, etc.

Con1: Deferring code to View Rendering adds complexity.
I have argued that having the session active outside the controller makes it more difficult to maintain and debug your code as the session is active during the View render component.  The response was that forcing a controller action to pre-load everything the view *might* need and just pushing a dumb model prevents the view from performing optimisation of what it actually loads.  I don’t believe that the View rendering should have such an impact on code execution but the point is a valid point of contention.

Con2: Loss of control.
When using AOP to manage the session lifetime you have much less control over the failed state.  As the failure occurred outside of the business logic context, you can’t easily use business logic to handle the failure.  If the standard policy is to simply direct you to an error page or similar, then this behaviour is completely valid.  However if you needed to perform actions based on the failure (such as attempting to resolve optimistic concurrency issues automatically) then you can’t. 

Con3: Loss of traceability.
When using a long running session there is no traceability between the persistence and the application logic that made the entity changes.  If you experience an unexpected query when the unit of work commits, you can’t immediately identify which code caused this behaviour.

Con4: Unexpected change tracking.
Having a long running (and potentially ‘invisible’) session exposes you to potentially unexpected behaviour due to the session tracking all changes on the underlying entities.  If you load an entity from the data access layer, then make some changes for display purposes (perhaps change a Name property from “Jason Leach” to “LEACH, Jason”) before passing that entity to the view, when the unit of work cleanup occurs it will persist that change to the database because all of your changes are being tracked. 
A less obvious example is if you had a Parent entity with a list of Children.  In a particular controller action you want to get the parent and only a subset of the children.  So you might select the parent and all children.  Then you may do something like Parent.Children = Parent.Children.Select(x=>childfilter).ToList().  Depending on your configuration this is likely to delete all of the children not matching the filter when the unit of work completes.  Oops.

Con3 and 4 are direct side effects of leaving a session open longer.  In the Parent->Child example, you would likely only need the session active when you did the initial load of the Parent and Child entities, then close the session and start manipulating the entities.   Obviously you should be mapping entities to ViewModels and never manipulating those entities, but it is a very easy mistake to make.  Coupled with con3, it can be near impossible to identify in a complex unit of work.

Conclusion
As long as you are careful about what you are doing, and not arbitrarily altering domain model entities while a session is open, then long-running sessions are not a problem.  However there is significant potential of issues if your devs don't understand the risks.
As long as you are happy with standard error/failure, or UI driven resolution workflows, then AOP for unit of work management is acceptable, again as long as the risks / limitations are acknowledged.

 

Thursday, March 20, 2014

Little Things

As I take on more architectural responsibility I write far less code.  I do however provide code samples and give advice on how to solve certain issues.

If there is a flaw in the samples provided that is obvious in the integrated solution, if not the code sample,  it is a major failing on my part to have not identified it.  I hate that.

When the flaw is obvious and the fix is equally obvious,  then is that still my failing, or a failing of the developers for blindly following a code example and not reviewing or taking the effort to understand the code. 

Probably both.