Friday, November 23, 2007

How to create a canonical data model?

When used properly the canonical form can provide great benefits in an SOA world such as loose coupling of applications, ease of integration maintenance and a common understanding of information, but when used improperly a canonical form can create a maintenance nightmare. So, how do you create the canonical form for an object?

I have seen several different approaches to addressing this issue that fall into two basic categories: create a superset of information or create the minimal subset of information. The sections below describe those approaches and their drawbacks.

Use all of the information that is available in the source system since sooner or later some application will want it.
This approach is probably the simplest way to create a canonical format, essentially every piece of information that is known about an object is passed around regardless of its usefulness to other applications. This approach has several easily identifiable drawbacks:
  • The size of your canonical form will be unnecessarily large for what is needed. This means that the consuming applications will have to sift through all of the unneeded data to get to what it needs. This adds to the potential for development bugs as developers try to figure out what fields they need to use.
  • The application that produces the event must needlessly create and provide information that no other application will never use.
  • The XML representation of the form will be large and consume a lot of resources creating, transporting, and consuming the message.
  • The benefit of this approach is that since the canonical form already has all of the information that can be provided so you will never have to modify your canonical form (Until the source application is enhanced with new information :)).
Try to think of every possible piece of information that any application now or in the future could ever need.This approach is another way to create a superset of fields in the canonical form.

  • This approach obviously will take a long time to determine all of the information that will be required by receiving applications.
  • I have seen several integration projects try to take this approach and fail before they even define the canonical form and start to integrate things.
  • Each application will have their own unique set of information that they require and all of the other applications will need to sift through it needlessly.
  • The benefit of this approach is if you can actually create a true superset then you will never have to modify your canonical form.
  • If you can predict what information future applications will need you are in the wrong business you should be perdicting the stock market.

Create a base form and then as new applications are added the new data fields are added to the canonical format.
This approach starts out as a minimal subset approach and then quickly turns into a superset.
  • The canonical form will grow to be extremely large as more and more applications are added.
  • Each time the canonical form is modified/updated all of the consuming applications need to account for the changes.
  • Eventually you will end up with the unmanageable supersets described above.
  • The benefit to this approach is that you can create the initial canonical form fairly quickly since it will contain only the information required by the currently know consuming applications.

Provide only the minimal amount of information required to identify an object as unique.
  • This approach is obviously the simplest to create. All that is required are the fields that make an object unique.
  • This approach normally leads to unnecessary work/overhead because now every application needs to make an extra service call back to the source to retrieve the information that is not in the message. As more and more applications are added the burden on the source application becomes greater until it can no longer handle the load.
So, what is the answer? How can you create the canonical format that provides all of the perceived benefits without the maintenance nightmare?

There are a couple of good approaches to take and rules of thumb that will provide the balance between to much information and too little information in your canonical form.

  1. Start small - Begin with the unique fields that make an object unique.
  2. Add common fields - Add the fields that are common among most of the consuming applications that you are currently working with. This one can be tricky so I normally go with the rule of thumb that if 80% of the applications need it I should provide it.
  3. Add information that is expensive to retrieve later - If there is information that is not in the message that is required by a consuming application, the application will need to retrieve it from the source application. If retrieving it from the source application is expensive (either expensive to recreate the information or their would be a high volume of retrievals) then provide that information to the canonical form.
Following these simple rules should prevent you from changing the form a lot. If the canonical form is good you should only need to change it when:
  • The source application is enhanced/upgraded to contain additional information that is needed by 80% of the consuming applications.
  • The cost of consuming applications retrieving additional information from the source application becomes to high.
If you find that your canonical form is changing a lot after it is initial created instead of just adding new fields every time step back and try to determine why all of the additional fields are required and how they were missed in the initial creation process.

In the end the canonical form should contain 80-90% of the information that is required by all of the consuming applications (the ultimate superset). This will minimize the time spent creating the initial canonical form and reduce the number of times that the canonical form needs to change.

There are also several standards organizations (like the Object Modeling Group) that have already created object representations that can be used as a starting point. These forms have been thought out over the years by many industry experts that are very knowledgeable about their respective spaces. Use these forms as a starting point, as they normally contain a lot of additional information that you may not need and can trim out.

Now that you have a canonical form in place the dirty word "governance" comes into play as other developers in other groups need information that is not already in the canonical form. A governance model must but put in place to prevent fields from being added to the form to make integrating to a single consuming application easier. Governance must look at the use of the canonical form as a whole and not just change the form to please the needs of one applications as this will lead to a complete superset and too many changes to the form.

What is a canonical data model?

Recently, I have been asked a lot "What is a canonical data model?". In the SOA world, the term "canonical data model" is thrown around as a way to impress the lay person. In reality the canonical data model is a simple concept:


The canonical data model (CDM) is a representation of common information produced and consumed by applications. The CDM is normally used to publish events in the form of messages out of one application into several another applications. The CDM is used as the format of the message so that all of the receiving applications know what information to expect in the message.


For example lets create the canonical data model for a person. The CDM for a person must include information that uniquely identifies that person:

  • Name
  • Birth Date
  • Social Security Number/Passport number

The information that is unique must be included in the CDM so that it is easy to distinguish if two person messages relate to the same person or different people. That is the lowest common denominator for a canonical message. A canonical message normally contains other useful information about an object that is used by most of the receiving applications. Continuing to build out the CDM for our person object, these other useful fields could be included:

  • Work Address
  • Home Address
  • Work phone number
  • Home phone number
  • Cell phone number
  • Fax number

As you can see the list of information describing a person can go on and on. The information each application needs to know about a person can vary greatly, so how do you know what to put in the canonical data model? In my next post, I will talk about how to create a canonical data model that provides the benefits without running into a maintenance nightmare.

Friday, August 10, 2007

Integrating with PeopleSoft

I just recently completed a project that heavily integrated with PeopleSoft. I have been on several projects and demos in the past that have integrated with PeopleSoft and figured it was time to share a little knowledge about how to do it.

The PeopleSoft Component Interfaces (CI) is a great way to integrate because it allows you to reuse all of the business logic and data validation that is used by the PeopleSoft Web interface. The version of PeopleSoft that you are integrating too will define the way you can call the CIs.

If you are lucky enough to be integrating with PeopleSoft 8.4 or greater, the CIs can automatically generate Web Services (as well as Java and COM objects) out of the box. This makes integrating with the CIs quick and easy since most integration tools can easily call Web Services. The only thing you need to worry about it getting the right data into the right fields.

If you are on an older version of PeopleSoft (pre 8.4) the CIs can be exposed as Java classes or COM objects. Since I mostly work with Java based tools for integration, we obviously used the Java classes. There are two ways to utilize the Java classes:

1.) Purchase the PeopleSoft Adapter from IWay. This exposes the CIs as Web Services that are easily called by any integration tool. The adapter is not the easiest thing to configure but once configured it makes integrating into a large number of CIs quick and easy. The adapter is costly so the ROI should be looked at before making the decision.

2.) Creating your own Web Service that utilize the Java classes exposed by the PeopleSoft CIs. This sounds like a lot of work but is actually pretty simple with a little knowledge of Java, PeopleSoft and a handy tool that will create a Web Service for you. We made use of the JAX-B framework and Oracle JDeveloper to quickly create these Web Service wrappers. Once the Web Services are created they can be integrated using most integration tools.

If you are starting off slow and only integrating to a handful of interfaces the second option might be most cost effective, but if you plan on integrating with several CIs (10+) it would probably be more cost effective in the long run to purchase the PeopleSoft adapter.

Is multiple BPEL OC4J Instance possible?

Recently while working on an integration project using the Oracle SOA Suite, a client wanted to have multiple OC4J BPEL instances inside of one Oracle Application Server. The desire was to utilize the mulitple OC4J instances to ease deployments and maintenance by running on one OC4J and performing maintenance on the other.

After doing some research and checking with some people at Oracle it seems that even though you can have multiple OC4J instances in one Oracle Application Server you can currently only have one BPEL specific OC4J container. There are a couple of reasons for this:

1.) The oracle home directory contains a BPEL directory that maintains the configuration files for the BPEL OC4J instance. BPEL allows you to have multiple domains (each with a different set of configuration files) inside of an OC4J container but the directory structure does not allow for multiple BPEL instances.

2.) The SSO for the BPEL console is pinned to /BPELConsole. This would limit access to the console if the OC4J instances were to be managed separately.

In order to accomplish what the client wanted multiple Oracle Application Servers needed to be created each with their own OC4J BPEL instance.