I have seen several different approaches to addressing this issue that fall into two basic categories: create a superset of information or create the minimal subset of information. The sections below describe those approaches and their drawbacks.
Use all of the information that is available in the source system since sooner or later some application will want it.
This approach is probably the simplest way to create a canonical format, essentially every piece of information that is known about an object is passed around regardless of its usefulness to other applications. This approach has several easily identifiable drawbacks:
- The size of your canonical form will be unnecessarily large for what is needed. This means that the consuming applications will have to sift through all of the unneeded data to get to what it needs. This adds to the potential for development bugs as developers try to figure out what fields they need to use.
- The application that produces the event must needlessly create and provide information that no other application will never use.
- The XML representation of the form will be large and consume a lot of resources creating, transporting, and consuming the message.
- The benefit of this approach is that since the canonical form already has all of the information that can be provided so you will never have to modify your canonical form (Until the source application is enhanced with new information :)).
- This approach obviously will take a long time to determine all of the information that will be required by receiving applications.
- I have seen several integration projects try to take this approach and fail before they even define the canonical form and start to integrate things.
- Each application will have their own unique set of information that they require and all of the other applications will need to sift through it needlessly.
- The benefit of this approach is if you can actually create a true superset then you will never have to modify your canonical form.
- If you can predict what information future applications will need you are in the wrong business you should be perdicting the stock market.
Create a base form and then as new applications are added the new data fields are added to the canonical format.
This approach starts out as a minimal subset approach and then quickly turns into a superset.
- The canonical form will grow to be extremely large as more and more applications are added.
- Each time the canonical form is modified/updated all of the consuming applications need to account for the changes.
- Eventually you will end up with the unmanageable supersets described above.
- The benefit to this approach is that you can create the initial canonical form fairly quickly since it will contain only the information required by the currently know consuming applications.
Provide only the minimal amount of information required to identify an object as unique.
- This approach is obviously the simplest to create. All that is required are the fields that make an object unique.
- This approach normally leads to unnecessary work/overhead because now every application needs to make an extra service call back to the source to retrieve the information that is not in the message. As more and more applications are added the burden on the source application becomes greater until it can no longer handle the load.
There are a couple of good approaches to take and rules of thumb that will provide the balance between to much information and too little information in your canonical form.
- Start small - Begin with the unique fields that make an object unique.
- Add common fields - Add the fields that are common among most of the consuming applications that you are currently working with. This one can be tricky so I normally go with the rule of thumb that if 80% of the applications need it I should provide it.
- Add information that is expensive to retrieve later - If there is information that is not in the message that is required by a consuming application, the application will need to retrieve it from the source application. If retrieving it from the source application is expensive (either expensive to recreate the information or their would be a high volume of retrievals) then provide that information to the canonical form.
- The source application is enhanced/upgraded to contain additional information that is needed by 80% of the consuming applications.
- The cost of consuming applications retrieving additional information from the source application becomes to high.
In the end the canonical form should contain 80-90% of the information that is required by all of the consuming applications (the ultimate superset). This will minimize the time spent creating the initial canonical form and reduce the number of times that the canonical form needs to change.
There are also several standards organizations (like the Object Modeling Group) that have already created object representations that can be used as a starting point. These forms have been thought out over the years by many industry experts that are very knowledgeable about their respective spaces. Use these forms as a starting point, as they normally contain a lot of additional information that you may not need and can trim out.
Now that you have a canonical form in place the dirty word "governance" comes into play as other developers in other groups need information that is not already in the canonical form. A governance model must but put in place to prevent fields from being added to the form to make integrating to a single consuming application easier. Governance must look at the use of the canonical form as a whole and not just change the form to please the needs of one applications as this will lead to a complete superset and too many changes to the form.