Monday, July 22, 2013

The Jell-O Effect


This is third installment in the series of posts related to Reactive Agents. The previous can be found here: http://blog.savigent.com/2012/05/merovingian-or-simple-explanation-of.html , http://blog.savigent.com/2012/08/to-tackle-complexity-we-need-to.html
My daughter loves to play with Jell-O on her plate - it jiggles and slides, and after a while starts falling apart.  The slab of Jell-O on the plate always reminds me of the conversation I had just before joining the team at Savigent.

In the mid-90s I was talking to a customer who described the brand new MES implementation as having the “Jell-O Effect”. I asked him to elaborate, I had never heard the term before and back then I didn’t have kids to know better.  I didn’t have to ask twice. “We can’t make any changes to the system”, he said, “it jiggles in many unpredictable places every time we make a single change”, he continued.

I spent couple of weeks at the plant as part of software vendor’s application group, helping to stabilize the system and learning an invaluable lesson in large scale architecture and implementation of manufacturing systems.

The MES vendor’s product was a database with an API layer to performed transactions against materials, machines, routes, scrap, etc.  During sales demos, everything was done using manual data entry screens, shipped with the product. Reality was very different from the demo – customer’s plant had highly automated manufacturing floor with multiple workstations, each running dedicated SCADA application. The “obvious” decision was to modify each SCADA application on every workstation to perform MES transactions and minimize manual data entries for operators. The resulting implementation created tightly coupled, fragile system with business logic spread across MES configuration, database and each and every SCADA application on the plant floor.

As soon as I opened one of the SCADA applications in the design environment it became obvious that technology was misapplied. Tag-based, scan-based SCADA was pushed very hard in an attempt to reliably detect plant floor events and execute transactions against the MES database. There were special precautions not to miss events, code to handle database call exceptions, timeouts and retries, even attempts to do concurrency. The amount of code and extra creativity needed to implement all the functionality earned my appreciation, but also raised a red flag.  SCADA was not designed to do event-driven, concurrent, transactional execution.  

Using SCADA for MES integration resulted in complex, fragile system that was not supportable and maintainable.  It was impossible to follow data flows through the system. Multiple scripts were changing different tags, creating complex dependencies. If an extra line or two of code was inserted in the script, it would break the timeout logic. For a person who didn’t develop the original SCADA code, even minor change was equal to the “red vs. blue wire” question – something might blow up. I also suspected that even for people who built the system, changing it after a while was not a trivial task.

Side Note – back then there were very few options for the tool or technology to build the system. The selection was between the C/C++, that not a single person at the plant would be able to support, and SCADA.  Unfortunately I see same SCADA packages pushed to do MOM implementations event today, 15 years later.

The Jell-O Effect was a result of taking a technology designed for one, very specific domain and using it to solve a problem in very different and much more complicated domain. CIM, MES, MOM, Level 3 or whatever next label is going to be is by its nature event-driven, concurrent and distributed, in other words – it is very complex.
The lessons learned studying the “Jell-O Effect” triggered a search for a better way to build MOM systems. Reactive Agents quickly rise as the top contender in the search. As we discussed before – reactive agents were invented to address complexities in the event-driven, concurrent and distributed world of robotics and artificial intelligence (AI). There are a lot of commonalities between these two worlds:
  • Interact with environment
    • IO
    • People
  • Work to achieve goals
    • Schedules
    • Recipes
    • Targets
    • Rules
  • Communicate with command and control and peers
    • Services
    • Protocols
    • Networking
  • Maintain state
    • Variables
    • Data stores
Using Agents to build MOM systems becomes a very easy decision once we analyze the above list. Reactive Agents give us a blueprint to address the shortcomings we find with conventional tools:
  • Encapsulation of the state – there are no shared variables, each agent is a black box to the rest of the system, internal state is protected from concurrent modification
  • Event-driven – each agent receives and generates events
  • Concurrent execution – each agent runs concurrently with other agents, in response to events
  • Formal interfaces – each agent can only receive and send predefined events (event-driven form of the service-oriented communication)
  • Individual agent changes while may change the reaction of the agent will not produce other side effects in the system
  • Adding new communication links between agents, or adding  new agents to the system can be done at any time, while preserving existing functionality
The first implementation of the Reactive Agents-based Platform was deployed in production in the fall of 1999. Savigent’s second generation, .Net-based Catalyst Platform is implemented at Fortune 500 manufacturers in demanding 24x7x365 manufacturing environments, all without the “Jell-O Effect”.

No comments:

Post a Comment