ERAM is Back

Although it wasn’t intended to be, this blog evolved into one primarily about the FAA’s attempts to replace its aged HOST computer system with En Route Automation Modernization (ERAM), mostly because there has been (and continues) to be a void of information about the project.

It’s been a while since my last entry because my facility – Minneapolis ARTCC (ZMP) – hasn’t used ERAM on live traffic since spring of 2010.  Given that little information about ERAM was forthcoming, there wasn’t much I could write about until now.

The program continues to be delayed by numerous problems and bugs, and costs continue to escalate.

ERAM was first used on live traffic at the key sites, Salt Lake Center (ZLC) and then Seattle Center (ZSE). ZMP became an “alternate” key site and started running ERAM on live traffic as well.

ZLC first started using ERAM on live traffic in October of 2009. At that time the system was fraught with problems. Nevertheless, ZMP went Initial Operational Capability (IOC) in February of 2010, our first use of ERAM on live traffic.

But in the spring of 2010, given all the problems with ERAM, the FAA was forced to rethink its deployment “waterfall” (schedule) and reassess the program. ERAM use on live traffic was halted.

In May of 2010, ZMP lost its key site status, preventing us from any more live ERAM use.

Then in August of 2010, ZLC started testing ERAM on live traffic again. Later that year, ZSE also started running ERAM on live traffic.

Somehow in the process ZMP’s IOC status was nullified. We were scheduled to start running ERAM again last fall, but more problems resulted in more delays.

Eventually we went IOC (again) last month (two years after our first IOC), which opened the door for us to start running ERAM on live traffic again for longer periods of time.

It’s notable that even though the FAA believed that ERAM was ready for use some time ago, it was two years before we started running it on live traffic again.  But trust them – I’m sure it’s ready this time…

We’ve since done some four hour ERAM runs during the wee hours of the morning, when there is little traffic. We’re now scheduled for an extended (24 hour) run this weekend.

In preparation for that run, controllers were required to get refresher training on ERAM.

During that training, we were advised that they have a mitigation book with solutions to over 400 known problems ERAM has. We were also told that there are over 1000 significant known bugs with ERAM.

Some of the mitigation schemes are downright laughable.  For instance, ERAM has a problem with VFR on top altitude assignments.  The workaround: don’t assign VFR on top.

So the FAA is back to testing a system with the flying public that they know is ridden with bugs.

Their fallback plan? It’s the same that it’s always been: the air traffic controllers.

They figure if anything goes wrong, the controllers will be able to work around the problem while trying to keep airplanes separated. After all, both ZLC and ZSE have been doing that for a year or so now and they haven’t killed anyone, so it’s fine, right?…

Part of the reason they’re deploying ERAM at other facilities is to address known shortcomings with the system. For instance, ZMP is tasked with debugging the non-hosted terminal issue, functionality that ERAM didn’t initially have.

But despite the fact that ERAM has plenty of known problems and bugs, apparently neither ZLC nor ZSE are generating many problem reports (PRs) because they’re “burned out” testing ERAM.

Air traffic controllers are a pragmatic lot, and a large part of their job is finding ways to work efficiently.  Air traffic control is one big exercise in efficiency; getting things done in a particular order and quickly.

Thus, controllers realize it’s almost always easier to work around systemic problems than fight to have them fixed. The longer controllers work with equipment bugs, the more they’ll accept them and do their best to work around them.

Remember what I wrote about habitutation?

That’s not to say the problems don’t affect how well controllers work – they cause extra workload and distractions, but controllers do their best to deal with them and/or ignore them.

But the FAA likes to play semantic games and say things like:

… the issues identified with ERAM were related to workload rather than safety, and were caused by new “workarounds” controllers had to perform.

Workload is implicitly linked with safety – controllers can only perform so many tasks at a time, and when they become distracted, or task saturated, they start making mistakes.  Those mistakes can cause lapses in safety.

Of course the longer the problems get worked around, the less likely they are to ever be fixed as well. Given that the FAA is paying for bug fixes already, many of the ERAM bugs are likely to be around for some time.

Without problem reports being generated often, both the FAA and the ERAM contractor (Lockheed Martin) will believe that the problems have magically disappeared.

The FAA is good at denying problems exist. They were denying there were problems with ERAM for quite a while.

Thus, because of the lack of lots of PRs, I’m sure the FAA believes ERAM is progressing nicely.  They’re also under a tremendous amount of pressure to resolve the issues with ERAM quickly, or answer for the failures.

A large part of why ERAM deployment stopped in 2010 was because of the controller’s union (NATCA) publicizing problems with the system.

However, eventually the union became “partners” with the FAA on the project.  But all the union really got was set up to be the fall guy, because they had no real authority when it came to making decisions about ERAM deployment.

As proof, in 2011, an Independent Operational Assessment (IOA) determined that ERAM was not ready for further deployment and the union agreed with that assessment.

But the FAA ignored both the IOA and the union, and decided to declare an In Service Decision (ISD) for ERAM anyway, which meant it could start running ERAM at other facilities.

The union and the FAA have allegedly since established a better “collaborative” relationship, but the union is still ultimately powerless when it comes to ERAM decisions.

And now that the union is collaborating with the FAA on ERAM, if the program fails the union will be partially responsible for that failure. That would be politically damaging to the union, who apparently naively thought they could fix all that ailed ERAM.

The union is telling us that deploying ERAM at more facilities will generate more PRs thus highlighting the continuing problems with ERAM.  Ultimately, the union won’t admit that ERAM is (still) a turd, leaving controllers who have to work with it between a rock and a hard place.

ERAM is the “elephant in the room” that neither the union nor the FAA wants to talk about.

There were more than a few controllers who seemed to believe ERAM would go the way of Initial Sector Suite System (ISSS), the late 1980’s-90’s program that cost the taxpayers billions in the 1980’s and was eventually scrapped with only part of it salvaged for use.

To solve long-standing Initial Sector Suite System cost, schedule, and technical problems, the FAA Administrator announced a restructuring of the project in June 1994. The system was scaled back and renamed the Display System Replacement.

ISSS was the cornerstone of the Advanced Automation System (AAS), the grandiose scheme of that era that was supposed to modernize the air traffic control system.  But all that center controllers (and the taxpayers) really got out of that system was different computer displays.

Of course the critical difference between ISSS and ERAM is that the FAA never accepted ISSS for use.

ERAM was supposed to have replaced the existing HOST computer system years ago, and when it wasn’t ready for use they had to get a new contract for maintenance on that system.

But in the days of government notions of “too big to fail”, they’ve put too much money into ERAM to turn back now.  That means regardless of its many problems, they will continue to deploy ERAM.

As for ZMP’s 24 hour ERAM run this coming Saturday:

  • Do I think it’s fair that controllers have to work with and around with a system that was poorly designed, poorly tested, poorly specified and is ridden with bugs? No.
  • Do I think that the taxpayers should have to pay for this debacle?  No.
  • Do I think it’s safe to test that system on the flying public?  No.
  • Do I think ERAM is going to have a major failure? Probably not.
  • Do I think there will be a major incident related to ERAM? It’s possible.

But then none of that matters, because we will be running ERAM regardless.

So far ERAM has only been running at facilities that have limited amounts of traffic.  But facilities that run much more traffic (like Chicago ARTCC – ZAU) are slated to start using ERAM soon, and under higher traffic (and data) loads ERAM will be subjected to conditions it has yet to be tested under.

That is bound to uncover yet more ERAM bugs.

Welcome to the future; welcome to the FAA’s Next Generation Air Transportation System (NextGen)…

Leave a Reply

Your email address will not be published.