Thoughts on LOTAR - Long Term Archiving and Retrieval

Thoughts on LOTAR
(LOng Term Archiving and Retrieval)
Walter W. Wilson
February 22, 2016

LOTAR -- LOng Term Archiving and Retrieval -- is an aerospace and defense industry project to develop standards to preserve digital design data for the lifetime of an engineering program. An aircraft program can last for decades -- the C-130 may last a hundred years. Can our digital design data be accessible that long? Axiomatic language may be a good solution to this problem. A recent webinar on LOTAR prompted me to write these notes.

A. The Case for Ultra-Long-Term Engineering Design Preservation

I favor taking a really long-term view. Instead of preserving digital data for decades, let's shoot for centuries -- even forever! Buildings can last for centuries -- even thousands of years (e.g. Parthenon, Pantheon, pyramids). Washington National Cathedral is supposed to last two thousand years. We need the architectural data to survive, too. For historic airplanes, such as the X-15, I think the design and analysis data should be preserved forever -- partly for the engineering knowledge and partly as a museum artifact. Paper drawings can last for centuries -- Da Vinci's have lasted 500 years -- and when periodically copied can last indefinitely. But more importantly, engineers of the future will be able to read and understand them. Is there a digital form that has an equally-good prospect for ultra-long-term preservation?

B. My Principles for Ultra-Long-Term Preservation

A typical argument for LOTAR is the legal requirement for things like accident investigations. But my motivation is that old design data -- even ancient data -- is worth saving simply because it is a contribution to human knowledge. It's the reason we have libraries and museums; the reason we have archaeology. My position on ultra-long-term archiving and retrieval is driven by the following principles:

1. We should save CAD operations and inputs, not just results. -- The current emphasis seems to be on saving just the final geometry. I think it is more important to record the CAD operations and their inputs from which the final geometry is computed. This would allow the final result to be recomputed at some future time, perhaps with modifications. This enables design reuse and better records design intent.

2. We should save programs that contribute to a design. -- To completely save a design, any software that "touches" the design should be saved. This could include in-house applications that compute geometry (such as tapered-offset surfaces), analysis programs used to make design decisions, scripts used to automate design steps, etc. We want the ability to retrace the design process and regenerate the result.

3. Trade studies and alternate designs should be saved. -- I find it most interesting to see images of early, rejected, versions of designs of famous airplanes and buildings, like the F-16 and U. S. Capitol. The evolution of a design is important history that is worth preserving.

4. We should save exact definitions instead of (or in addition to) computed approximate geometry. -- An "exact" definition for a curve might be that it is a geodesic between two points on a surface. But the actual computed curve could be a NURBS approximation. We should save the exact definitions since they represent the design intent. They also provide the option of substituting better approximation algorithms when the geometry is recomputed in the future. A symbolic constant like cos(30deg) should be saved since the floating point constant .8660254 is just an approximation and communicates less information. Even an input like 0.1 should be saved in decimal form, since it has no exact binary equivalent. Designs may be recomputed on future systems with longer word length -- 128 or 256 bits instead of 64 bits -- and one will want one's constants to have the maximum possible accuracy. CSG (Constructive Solid Geometry) expressions represent exact definitions for solids while the computed B-Rep (Boundary Representation) solids are usually just approximations. Furthermore, they are often not "watertight", which can be a problem for additive manufacturing.

5. We need the ability to reproduce approximate results down to the last bit. -- If we want the ability to re-create a design using stored CAD operations and inputs and the stored programs that touch it, I would like to see a result that is identical to the original result -- identical to the last bit! Geometric algorithms are very sensitive to tolerances. A geometric test for whether or not a point is inside or outside a solid might give a different result with higher-precision arithmetic and this could ultimately yield different final geometry. The only way to guarantee an identical final result is to require all steps to yield identical results down to the last bit.

6. We need to preserve the geometric engine of the CAD system. -- The only way to achieve the binary result exactness of principle 5 is to preserve the geometric engine of the CAD system used to create the design. Thus, when rerunning the CAD operations and inputs of principle 1, the generated result would be identical (assuming identical floating point operations).

7. Floating point operations need explicit definitions. -- Floating point operations and number formats may be different on future computers. To achieve the binary exactness of principles 5 and 6, these operations need to be explicitly defined as functions on bit vectors. They could then be run on future computers and identical bit-vector results would be guaranteed.

8. We need to save the geometric engine in source form. -- If you want to count on rerunning a piece of software arbitrarily far in the future as indicated by principle 6, then it should be in source code form as opposed to just keeping the vendor's executable binary program. This applies not just to the CAD system but to all software that "touches" the design.

9. The programming language for the software source code would be the long-term standard. -- If one saves the rerunable source code of all programs used for a design, then the design is guaranteed to be accessible and reusuable. In fact, a "data standard" is no longer needed -- just archive the design software source code along with its inputs and -- voila! -- the ultra-long-term preservation problem is solved! And the standard for the programming language definition would be orders of magnitude smaller than the current STEP standard. It would not require continual enhancement or vendor support.

10. You had better pick a good long-term programming language! -- The ultra-long-term preservation of design data would depend on the programming language for the design software. Thus it had better be well-chosen! It seems likely that some programming languages are destined to become "extinct" -- unsupported and thus difficult to run and eventually forgotten. I would bet that current mainstream languages like C++ and Java will end up in this category. (See Paul Graham's The Hundred Year Language.) A good long-term language would be simple, elegant, minimal -- founded on basic mathematical and computer science concepts that will always be around. One candidate would be a minimal Lisp language, which can be defined in a few pages. (How's that for a simple standard!) The minimal language should be extensible so that new features can be defined within the language instead of being built-in. The language should be a good host for embedded domain specific languages. (Using a language to represent engineering design data would be one example of a domain specific language.)

C. My Solution for Ultra-Long-Term Preservation

My solution for ultra-long-term design data preservation is my logic programming language axiomatic language. Axiomatic language is minimal and elegant -- ideal for a long-term standard. It is so tiny that even arithmetic is not built-in. Floating point operations must be symbolically defined. But this means those definitions would give identical results on future computers, regardless of floating point hardware. The metalanguage capability of axiomatic language allows one to define new language features within axiomatic language. This makes axiomatic language a good host for embedded domain specific languages, such as a language for representing engineering design data.

An engineering design language would define geometry with high-level operations, not megabytes of surface control points. Complete history and associativity would be inherent in the definitions. Engineering knowledge would be captured. With its programmability one could save design scripts, analysis functions, and user customizations. The definition of the engineering design language within axiomatic language would include an open source geometric engine.

Note that the engineering design language would not be the long-term standard -- axiomatic language would be the standard. The definition of the engineering design language would be archived along with the design data. It would be thus be free to evolve with new engineering programs. As long as axiomatic language is executable, a design would be completely rerunnable and reusable.