September 23, 2010

Rigor and Formality

Of all the engineering disciplines, Software Engineering seems to have the least amount of guiding scientific principles and universally accepted methods. It remains as much a psychological / sociological practice as it is technological. The reason for this stems from the fact that Software Engineering is still primarily a creative activity as opposed to a mathematical (i.e. formal) process.


Programming versus Engineering

It should be noted at this point that this is not a comparison of software programming to software engineering; these are two fundamentally different scopes of software practices. Software programming normally involves one or two developers fulfilling all the roles of architect, designer, coder, tester and maintainer to deploy a relatively specialized system with a narrow scope of deployment. Many popular methodologies exist to support this style of development including Agile and Extreme programming. Software Engineering involves a team (or multiple teams) of specialists working toward the creation of a complex system supporting the needs of a diverse set of users which is to be deployed in multiple configurations all executed in a predictable manner. Software Engineering environments produce multiple complex systems in a predictable manner.

Software programming is a more ad hoc practice suitable for specific types of projects such as prototypes, utilities or clearly (i.e. narrowly) scoped projects. Even the largest organizations harbor software programming environments particularly in the maintenance and evolution of established or legacy systems.

Software Engineering environments are found anywhere there are multiple stakeholders and technical resources must coordinate their efforts to achieve their goals. It is a more formal and scientific endeavor which can only be successfully performed with the formality and rigor of the other engineering disciplines such as physical and chemical engineering.

Guiding Creativity

Because software development is a creative activity, there is an inherent tendency toward informal ad hoc techniques in software specification, design and coding. While such development practices can, statistically speaking, be successful in some large, complex projects, the need for rigor and formality will become apparent over time. Eventually the teams members begin to burn-out, the projects become too large to properly manage with an informal style, deliverables begin to slip, requirements are skipped or misinterpreted, code debt is incurred and overall product quality suffers. Predictably executing on large-scale projects requires significant effort.

Small software development environments which start out with one or two programmers can often utilize software programming practices for the organizations initial projects. Once that organization grows, however, those same practices begin to generate more problems than they solve. While it may seem that those practices are speeding the development, the quality of that product begins to suffer. Coordination between the increased numbers of participants and stakeholders tends to overwhelm the team and less time is spent productively. Communicating all the needs and expectations becomes difficult and the variables of each project increase greatly making success more difficult to reproduce and failure more difficult to avoid. Code debt and software defects continues to increase until the code base becomes unmanageable. It begins to cost more to maintain and evolve the code and ultimately the margins begin to shrink as software productivity declines. This is when software programming must evolve into Software Engineering.

Software development is a creative activity but is must be practiced systematically and with discipline. Rigor is a necessary complement to creativity that increases confidence in the results of development activities. Creativity often leads to imprecision and inaccuracy and software development can tolerate neither. The informal approach to software development is contrary to good Software Engineering practice.

The Need For Rules

Software Engineering involves many people of differing skill-sets, goals and interests. Without set rules, each participant imposes his/her own interest on the project. When problems occur, they become difficult to resolve and often result in unproductive conflicts. Software development needs a set of rules which allow participants to divide the workload without losing track of the work to be performed.

The rigor of a software development process is the level of discipline the process exhibits often through governance; it is the rules which direct how the process is executed. With rigor, a process can carry on smoothly without hindrances but without it projects invariably stray into problems resulting in unreliable products, high costs, missed schedules and even project failure. Rigor helps to produce products with higher reliability, greater quality while controlling costs and meeting expectations. Of equal importance is rigor enables repeatability and allows teams to avoid problems experienced in past projects.

The cost of activities early in the projects life seems high at that point in time but is insignificant when compared to the costs incurred in later phases. Time spent in clarifying requirements is minuscule when compared to the cost of re-designing a product to accommodate a forgotten requirement. Consider building a system which is required to encrypt all data in transit and then trying to retrofit such encryption requirements into a existing system a year after it was released. All software development projects can benefit from some set of rules to follow to ensure important requirements are not missed or undervalued. Some level of rigor must be applied in requirements gathering and analysis to ensure success and to avoid future costs of issue remediation.

Setting and following process rules allows all participants to operate in a coordinated manner.

Formality

There are varying degrees of rigor from completely ad hoc to the highest level of formality. A formal practice is where software systems can be verified by mathematical laws. Automated testing and error removals are not only possible but one of the benefits of adopting formal rigor in software development. There is a branch of software engineering known as Formal Methods which researches the potential of applying formality to software development. While is is well debated in many classrooms, conferences and coffeehouses, practical application of formal methods has been limited.

An excellent example of formality can be found at NASA in the development of software for space flight. The Goddard Space Flight Center (GSFC) software process has produced products with near zero defects. It is a testament to the level of rigor exercised by GSPC software engineers.

Summary

Rigor one of the elements which separates software programming from Software Engineering. While software development is a creative process, it benefits from some level of rigor which governs how that creativity is applied. Some programming projects complete just fine with a low level of rigor, but to consistently execute software development with predictable and high quality results, software engineers use rigor to guide their activities.

Continue reading...

January 9, 2010

High Availability

The availability of a system is a measure of its readiness for usage and is a component of a systems dependability

It is possible to approach 100% availability with messaging-based systems because multiple copies of a message consumer can be running and listening to the message bus at the same time which act as a pool of consumers all working to process a message. This is called parallelization and allows the deployment of as many fail-over consumers as may be required to ensure that the logic provided by the consumers is available at all times.

Through parallelization, a message consumer can be replicated throughout the data processing environment, observing messages on the same message group. This allows all the consumers to receive a copy of the message and process the data therein at the same time. The message producer never needs to know where these message consumers are running or how many of them there are. One or more of these replicated consumers may fail and the message will still be processed. This is one of the big benefits of broadcasting data through a message bus.

There are times, though, when some level of coordination will be required between the parallelized message consumers. At the very least, the amount of redundant processing is wasteful but more likely the redundant processing will result in logic errors such as multiple entries being posted to a ledger. Tome level of coordination is therefore required to reduce or eliminate the likelihood of duplicate processing. This coordination can take many forms and there are a variety of design patterns which can address this issue. Fortunately, mos message infrastructures have a feature which allows coordination between all parallelize consumers; message queuing.

Message Queuing

Nearly all message infrastructures have a quality of service (QoS) for the delivery of messages known as queuing. When message brokers utilize queues in message delivery, the messages are placed in storage (memory, disk or both) from which message consumers retrieve their messages when they are ready to process the message data. The message broker removes the message from storage once the message is successfully received by a message consumer and the message becomes unavailable for retrieval. This ensures that only one message consumer will receive the message for processing.

Message queues are a First In, First Out (FIFO) construct which ensures messages are made available to consumers in the order they were received by the broker. Some brokers allow message prioritization, message read-ahead (i.e. peeking) and some even allow for message push-back. These are qualities of service supported to varying degrees by some and not all message broker technologies so one should not design their systems to utilize them unless it is acceptable to become dependent on a particular messaging vendor. The common QoS for message queuing is FIFO delivery to only one message consumer.

When using message queuing, it is a simple matter to have multiple versions of the same message consumer running on different hosts in different sites all retrieving messages from a message group with message queue message delivery. In this configuration, a deployment can suffer a loss of one or more message consumers without a loss of service. Therefore parallelizing message consumers on a message queue is a relatively simple and efficient way to reduce failure points and to ensure high availability.

Consider an example where messages containing sales orders are sent to a message group for processing. Once processed, a message is returned to the sender of the sales order message as confirmation. The system would have a single point of failure if there were only one sales order processor listening for sales order messages. If that processor failed, then the sales order system would be unavailable. If the architect placed multiple sales order processors on the message group, all the currently operating sales order message processors would receive and process the message resulting in multiple orders in the system. If all these message processors were instead listening to the sales order message queue, then the message broker would only let one of the message consumers process the message resulting in only one sale order being entered in the system.

Load Balancing

When components are parallelized and using message queuing, each component handles processing according to its capabilities. When a component is busy processing a particularly complex operation, it is unavailable for processing the next message sent to the message group. It will be the other instances which will receive the subsequent messages for processing. This results in automatic load balancing where messages are consumed by bus participants only when they are available; the message broker will not attempt to deliver a message to a message consumer while it is busy processing a previous message or the component has experienced a logic error and has stopped processing. Only message consumers which are operational and ready to process the next message receive messages.

While not as configurable as some load balancing systems, this load balancing is achieved free as a by-product of parallelizing using message queues. Each message consumer can take as long as it needs to process the message and retrieve the next message from the queue only when it is ready. If messages begin to pile up in the message queue, then additional instances of the message consumer can be deployed to handle the load, but that will be covered in another article entitled “Horizontal Scalability”.

Fail-over / Fall-back

Another by-product of using message queues in parallelized deployments is Fail-Over processing when a component fails and the subsequent Fall-back when the component is restored. Although these concepts are commonly used to describe the operation of connection-oriented, client-server relationships, the results and qualities of these concepts are achieved through the use of “connection-less” message bus deployments.

In a client-server design, if the server fails the client must find, or “fail-over” to, another server in order to have its requests processed. The client needs special logic to detect the failure in the service to which it is currently connected, break the connection, locate another server, establish a connection and resume processing. Subsequently, the client needs logic to detect when the original server is restored so it can “fall-back” to the primary server and continue its processing. The logic in the client can be rather complex and tightly coupled to the particular server deployment in the data center. All this combines to make fail-over and fall-back operations difficult to implement, operate and maintain.

Should a parallelized component fail, the remaining components will continue to operate by pulling messages off the message bus and processing the data in those messages. In effect, fail-over is accomplished automatically because the failed component is not available to pull the message from the bus while its peers are. When the failed component comes back on-line, it immediately begins processing as work becomes available by grabbing the next message from the message queue effectively providing fall-back.

Maintenance Scenario

Even when a system needs to be upgraded, a system can remain 100% available and not suffer any downtime throughout the upgrade. Consider the following real-world business case.

An authentication system has been deployed which accepts encrypted credentials from clients through a message group which has a queued delivery scheme. Several message consumers have been deployed throughout the data center on separate hosts which utilized a clustered database for its data store. The result is that there are no single points of failure in the system with the services and the data store operating in a high availability mode.

The operations staff are tasked with upgrading the authentication service to one which uses not the data base but the corporate directory through the Lightweight Directory Access Protocol (LDAP). The database has been locked from updates and all the security principals along with their credentials have been imported into the directory.

The new message consumers providing the authentication service through the LDAP are deployed while the existing message consumers which use the data base are left on-line. The result is both version of the message consumers are running and authenticating credentials. Some of the credentials are being authenticated via the database and the rest are being authenticated via LDAP to the corporate directory.

Operations carefully watches the logs of the new components for any indications of problems and the operation of the new components are verified before the migration is completed. After no errors are observed in the logs and the help desk reports no clients have reported any issues the last step of the service update is started.

The older components which utilize the data base for credential authentication are terminated individually. As each one is terminated, the system logs are scanned for any errors. After all the original authentication services are terminated, the data base is backed-up and scheduled for purging.

The result is the authentication system on which all the other systems depend has been completely swapped out without loss of service. Authentication services were available at all times throughout the maintenance window and none of the end users or dependent system were aware of the service upgrade.

Versions of the above scenario has been performed numerous times in high volume production environments within critical business systems with no loss of availability. In all cases, the systems suffered absolutely no downtime and maintained 100% availability.

Summary

Utilizing a message bus allows one to deploy highly available systems through the use of parallelization of message consumers. By using a queued message delivery, the deployment can eliminate possible duplicate processing of messages and avoid having to code coordination logic in the message consumers which use a broadcast delivery method.

Continue reading...

December 30, 2009

Basic Message Bus Concepts

The use of message passing has been around since ARPA created the first packet switching computer networks. Since then, streaming protocols have become far more popular than the simpler messaging schemes and have all but eclipsed and to some point hidden the benefits of message passing. With the gaining popularity of virtualization, cloud computing and parallel processing, the resultant distributed systems are posing challenges to systems architecture which message passing and particularly the message bus have answered with great success in even the most demanding of environments.

At its core, the message bus is a facility to exchange data between components of a system in a decoupled manner. In this case, decoupled means the sender of the data need not know the location or even numbers of receivers. All the sender need know is what data to place in the message and to which group the message is to be sent. The sender then passes the message to the message broker and the message is routed to those participants interested in messages in that group.

Contrast the message bus with traditional communications schemes which require every sender of a message or data stream to know the address of every component which is interested in receiving its data. Even in those situations where there is only one receiver of a message, which in itself can be a limiting factor, managing end point addresses makes “point-to-point” communications difficult to manage, particularly in dynamic environments where the composition and number of endpoints change in a components operational life as is often the case in distributed systems.

The best analogy is that of the differences between the telephone network and that of using two-way radios. With the public telephone network, one will use a telephone number to establish a connection to a particular recipient of a message. Using the two-way radio, the sender uses a particular radio frequency on which to broadcast the message.


Concepts

It is important from the outset to define several basic terms and concepts which are core to any discussion of messaging. Even someone who has worked with a particular implementation of a message bus will need to be aware of terms used in other message bus environments if for no other reason than to translate the concepts into the vernacular of their own messaging environment.

There are a variety of message bus technologies in operation today and while many terms are shared between these technologies their use varies slightly. In many environments, a single concept is given different names and the term used depends on the implementation or the vendor providing the technology. Working across different messaging implementations then causes confusion as multiple terms will be used to refer to a single concept.

This article seeks to homogenize common concepts of the message bus into a single set of terms to facilitate discussions regardless of the implementation or vendor technology being used.

Bus Participants

Any component that sends or receives messages on the bus is called a bus participant. Bus participants can both send and receive messages or simply send or receive messages. In any case, bus participants process messages and often represent the true business logic components of a system.

Bus participants are often sub-categorized by the directionality of the messages being processed. Participants which create messages are often called message producers while participants which receive messages are called consumers.

Bus participants are quite often both producers and consumers of messages particularly in service-oriented systems where request messages are sent and the results of processing are returned to the sender of the request. In this case a bus participant acting as a client will be both a producer of request messages and a consumer of response messages.

Message

The message is the primary data unit for the message bus. Bus participants marshal data primitives through an agreed upon message construct which can take a variety of forms. Most commonly, this message construct is composed of a set of data elements grouped into two basic sections; the header and the body.

The message header contains data elements called message fields which are to be used in the processing of the message as a whole. Message headers normally contains addressing data which assist in messages routing and delivery. Header data also contain timestamps, addresses and identifiers which allow the message to be tracked and correlated in its delivery and processing at the application layer.

While application logic may access header data, it is the data in the body of the message, again in separate and addressable message fields, which is considered of particular interest to the bus participant. There is normally no need for a message unless there is a body to to be sent, at least from the perspective of the application layer. Messages without a body can be sent to satisfy some exchange protocol between participants; such an acknowledgment of delivery or processing.

Messages are often marshaled between different formats; normally that of working memory and that of the communications protocol. The format of the message as it is passed across the communications network is called the wire format and is specific to the message passing protocol or vendor technology being used.

There are various message communications protocols and wire formats in wide use today. Because each protocol has its own unique combination of strengths and weaknesses, it is common to find a variety of different message passing protocols and wire formats used in any data processing environment.

Groups

The primary addressing mechanism for message delivery is the message group. All messages are assigned to one group by the message sender. All messages which share this classification are handled in a similar fashion and are delivered to a similar set of participants. For the purposes of discussion, consider the message group simply a string of characters which acts as a message classifier.

A message group is an endpoint for a message. All that any message producer knows about the message bus is that there are groups of messages which share some logical correlation and are to be considered and processed similarly. The producer of the message need not know anything of the possible consumers of the message, how they are implemented, where they are operating or even how many consumers there are. All that is needed to be known before sending a message is the message group and the producer of the message can be sure the message will be routed to the appropriate bus participants.

Although this is a topic large enough for an article unto itself, it is useful now to mention the importance of naming governance. Because naming of message groups are (usually) determined by the message producers, it is very easy for message group names to vary widely in formatting. This results in administration issues when the use of the message bus grows and group names have to be tracked with the messages they contain and classification of the message consumers to be found on those groups.

This is complicated when group wildcards (also the subject of another article) are use to observe messages on a multiple message groups. In this case the consumer joins a message group with a name such as “sales.orders.>” and the consumer will receive messages whose groups match the wildcard. This works fine if all sales order related message groups begin with “sales.order” such as “sales.order.create” or “sales.order.update”. Development teams run into problems when new schemes are use and the consumer has to join different groups with different calls. This transfers the work from the broker to the API and the consumer because the wildcard will not match a group named “order.sales.delete”.

The naming of message groups is of strategic importance to any message bus implementation as group names are the primary address mechanism. If a message bus implementation is to be manageable as it grows, group names must be governed wisely.

Broker

Message Brokers are software components responsible for accepting messages from their clients (i.e. producers) and performing the processing necessary to ensure messages are routed and delivered to the appropriate receivers. Brokers are essentially services which accept messages for delivery to the appropriate bus participants.

Message brokers are a form of Message Transfer Agent, very much like the electronic mail (email) servers on which so much of our society depends for communications these days. A message client creates a message, opens a client connection to the mail server and send the message to the server. Once received, the mail server then processes the mail message and delivers the message to the appropriate destination. The Message Broker operates in very much the same manners, except the broker can handle complex data types and near real-time message delivery all while handling participant management and complex message routing, delivery and in some cases persistence operations to guarantee message delivery.

Most brokers operate similar to one another; they accept a connection from a bus participant and exchange messages with that participant over some protocol specific to that broker. Every broker is different and while there are standard APIs to assist the developer in connecting to and interacting with the message broker, each broker will normally require its own libraries and API. This is true even in the case where a standard messaging API is used to interact with the broker as the standard API is often simply a set of interfaces which the broker API must implement. The underlying data exchange varies between the brokers of different vendors.

There are several efforts to standardize the communications protocols between message brokers and their clients. This will greatly simplify connecting to and interacting with message brokers as a single API can be used for the different message broker implementations. Until then, developers will have to deal with supporting multiple API and communication protocols in their solutions or use one of a number of message gateway products.

Many message brokers are capable of operating in coordination with one another, creating a distributed framework of brokers which are highly resistant to failures and spread processing across the entire broker network. In this configuration, the messaging infrastructure provides highly reliable message delivery to all bus participants

Broadcast

Message brokers are responsible for the delivery of messages to bus participants which are interested in messages belonging to a particular message group. When the broker delivers messages to the interested participants, it makes copies of the message and sends a copy to all the participants. This mode of message delivery is known as “broadcast” as all interested participants receives a copy of the single message.

Message broadcasting operates like a two-way radio. The sender of a message transmits on a frequency and everyone monitoring that frequency gets a copy of the message. Broadcasting is a simple and efficient way to transmit data to multiple parties at once.

Queued

Queued delivery of a message is very similar to broadcasting in that a message producer sends a message and the message broker makes the message available to all the participants interested in messages in that group. The difference is that the broker only lets one of the participants interested in the message retrieve a copy. If no one retrieves a message, the broker holds the message until it is retrieved by a message consumer. In effect, the broker places the message in a queue where the first messages to be placed in the queue are the first ones to be retrieved by participants.

It is a queued delivery mechanism which makes systems based on a message bus so resilient to faults and horizontally scalable. By having multiple message consumers operating on a message queue, the message sender need only send the message to a message group with queued delivery, and even if one of the message consumers is busy or even failed, the other consumers will have the chance to retrieve and process the message.

In service oriented architectures, services are implemented in message consumers listening to message queues which contain service requests. Each queue represents a service and invoking a service is a matter of placing a request message in the appropriate queue.

Summary

There are a variety of concepts which are unique to message bus infrastructures. This is because the industry is used to dealing with point-to-point streaming protocols like TCP and higher level protocols like HTTP. Using message/packet based protocols like UDP and particularly those which implement message bus infrastructures have compelling advantages over streaming protocols and are finding their way into more data processing environments.

Continue reading...

About This Blog

This is a low frequency blog on implementing SOA designs using message brokering. It is a list of practical tips developed over several years deploying service oriented systems for targeted and mass deployments in the telecommunications carrier and electric utility markets.

Recent Articles

Blog Archive

   Copyright © Steve Coté - All Rights Reserved.

Back to TOP