Engineering the Tree of Knowledge

by Bill Zimmerly

"Knowledge is power."
- Francis Bacon. 1561-1626

(This suplimental documentation describes zHTTP*, an Internet Server program that follows the HTTP 1.1 specifications for serving up World Wide Web pages on browsers like Internet Explorer and Netscape Navigator. Such programs are commonly known as "webservers". Examples include Microsoft's IIS, O'Rielly's Website, or Netscape's Enterprise Server. Unlike most webservers however, zHTTP* has been designed to service a very special type of Internet Application known as a "Knowledge Base".)

Part 1: Introduction

In the June 21st, 1999 issue of Information Week magazine, there appeared a letter that I thought would make for an excellent introduction to this subject:

Knowledge Needed

  I must disagree with "Right Idea, Wrong Approach" (June 14, p.10). here is a need to develop methods for identifying the knowledge to be captured, then capturing it, classifying it, and distributing it. Experiments need to be performed, results analyzed, many ideas discarded, and some built on. Knowledge management is not yet an operations job, which managing IT has become, but a research task.

  Given the conflict in goals of managing an emerging discipline with an operations job, IT directors shouldn't be wearing both hats.

Terry Smith
Data Architect
Structured Data Solutions
Houston

My primary role for the last decade or so has been and continues to be the same as my job title, "Systems Research and Development Manager" and I agree with Terry 100%, this is a job for a researcher such as myself.

As Terry put it, Knowledge Engineering* is an "emerging discipline", and I am well equipped to be a pioneer in it because of my varied background in Computer Science.

Many great ideas in the history of technology were brought about when people of vision realized that the right combination of tools and techniques have coalesced at just the right time in order to be useful to a completely different discipline.

It was my opinion that from the early 1980s on that Moore's Law would bring us to the stage where small computers will have enough random access memory to be able to store huge data structures. Data structures such as resident unabridged dictionaries, thesauruses, databases and knowledgebases.

I also realized at the time that in order for such large data structures to be useful to the users of the computer system, other things needed to be in place as well. Things like networks and network protocols that permitted the sharing of such structures as well as ways in which the structures can be enhanced with more complete knowledge and information. Things like inference engines that used the structures in order to cross-link facts and data for better application of the knowledge.

Part 2: What is Knowledge?

The excellent online tool known as Princeton University's "WordNet" defines the word knowledge as follows:

    cognition, knowledge - (the psychological result of perception and
                            learning and reasoning)
      => psychological feature - (a feature of the mental life of a living
                                  organism)
    

While the scholars at Princeton University chose to limit "knowledge" to the realm of living organisms, something that I don't necessarily disagree with, I believe that it is not out-of-bounds to speculate and theorize about the possibility of creating machines that can properly represent and apply knowledge.

But going beyond speculation and theory, I propose a practical way of actually implimenting data structures and algorythms that will emulate some of the most important ways in which living things use knowledge. In so doing, I predict that computers will take on a more organic way of behaving where such principles are applied.

To this end, I will begin my work with my own definition of knowledge that I find to be useful and practical:

Definition 1: Knowledge is an ordered collection of related facts and ideas.

Central to using such facts and ideas is the way in which they are ordered. In its simplest form, the proper ordering of related facts and ideas begin with the symbolic representation of each one. The name, or handle if you prefer, is the critical first step in the proper ordering of facts and ideas into knowledge bases.

Axiom 1: Naming something is the critical first step towards being able to understand it. (Abstract reasoning requires the indirection of symbols in order to apply the rules of logic.)

My name is Bill. Simply naming me provides a lexical framework with which we an apply the tools of reasoning to and be able to draw logical conclusions about. The name is a convenient handle with which to represent a whole universe of facts about my person. Regardless of how many of these facts are actually known and likewise how many are not known, the handle is of extreme importance because it is the very essence of how the human mind grasps and processes reality.

Consider for example the kinds of facts that might be deduced from "Bill" by following the ordered list in a tree of knowledge:

    Bill is an instance of the class man.

    Man is an instance of the class human.
      (Deduced by the attribute gender.)

    Human is an instance of the class things.
      (Deduced by the attribute living.)
        etc.
    

As stated earlier, there are whole universes of facts and attributes about any one of these classes, but the important thing to realize is that they can be ordered so long as they can be named! This, more than anything else, is the principle concept in symbolic reasoning and knowledge engineering.

Axiom 2: Classification establishes the initial order about which the rules of logic can apply.

Axiom 3: The listing of class attributes provides a logical framework for deriving facts and determining unknowns.

After the naming of something, comes the classification of it, or the ordered listing of attributes. Classification is knowledge engineering that proceeds towards the root of the tree of knowledge. The ordered listing of attributes is knowledge engineering that proceeds in the opposite direction, outward from the root.

Part 3: What is a Tree of Knowledge and how can it be used?

In the previous section, I used the term "tree of knowledge". In order to better understand this important concept, one must understand that there can be no knowledge without a natural, structural representation of the facts and ideas being ordered.

Knowledge is hierarchial, like a tree. Towards the roots of the tree are the "parent" entities and moving outward towards the leaves of the tree exist the "child" entities.

Axiom 4: Inheritance of attributes is the principle way in which facts and ideas are applied.

Without a way of applying knowledge, knowledge is of no use! The hierarchial nature of knowledge provides a mechanical framework for determining facts and for deducing unknowns. For example, knowing that "Bill" is an instance of the class "man", and knowing that the class "man" has an attribute of "weight", a knowledge based inference engine can deduce that (1) Bill has weight since he inherits the attributes of the class "man", (2) Bill's weight is unknown, and (3) the engine can make its knowledge more complete by asking Bill how much he weighs!

Axiom 5: Knowledge can ONLY increase by following a procedure when you know what you DON'T know.

By applying Axiom #5, a computerized knowledge-based system can grow in an organic manner by always seeking to "complete" its knowledge by generating queries of the human users of the system. This is the key concept behind the design of Zimmerly Knowledge Base systems ("ZKB* files").

Part 4: What is Knowledge Engineering?

Definition 2: Knowledge engineering is the art and science that goes into creating machines that are able to emulate the behavior of human experts with particular domains of knowledge.

Specifically, knowledge engineers are computer programmers who are skilled at communicating with both human beings, the experts in their fields, and computers.

Skilled, that is, in being able to program the computers to closely emulate the processes by which the human expert applies knowledge to a problem domain in order to solve those problems. To emulate such processes with a computer requires the utmost in system flexibility - the ability to adapt to changing requirements - and to be able to capture and use knowledge in a way that minimizes the effort on the part of human beings.

Part 5: How can a "Tree of Knowledge" be Represented in a Computer?

The memory of most modern computers are vast. A tree of very comprehensive knowledge can be represented as a linked list of definitions, attributes, and other nodes that allow the natural pruning of the tree for facts.

At minimum, the memory must be flexible enough to permit the modification of attributes and links and must always be seeking, as a background task, to complete its knowledge base.

Definition 3: A Complete Knowledge Base is a knowledge base that has no unknowns. Specifically, this means that all attributes of all objects have known values and these values make sense based on deducable rules of logic and other facts.

Ways in which knowledge bases "find completion" include, but are not limited to:

  1. Prompting human users for incomplete knowledge.

    "Mary, what is your Social Security Number?"

    "Bill, how much do you weigh?"

      etc.

  2. Gathering data from instruments and accessable data bases.

    Embedded processors

    Factory automation & robotics

    Laboratory instruments

    Hospital patient monitoring equipment

      etc.

  3. Searching and collating information from the Internet using sophisticated search engines and customized lexical parsing agents.

    Standard reference websites

    Competitor websites with price lists

    Government databases that are online

    Map sites, "PeopleFind" sites, Standard Search Engines

      etc.

So long as the knowledge base is incomplete and therefore able to generate unknowns, there is a supply of ample work for the system and the people who use it to do!

Part 6: What Programming Language is Ideal for Implimenting Such a Structure and Why?

The Forth dictionary is the ideal data structure for representing a tree of knowledge and for applying that knowledge.

Forth is a simple and very powerful programming language that is word oriented. At its core, Forth provides a set of tools that enable the Knowledge Engineer to be able to easily impliment Knowledge Bases that follow all of the axioms I've listed and the interactivity to enable the system to converse with human beings.

All objects in the Tree of Knowledge can be built up from a very simple recursive defining word AND the tools that traverse the Tree of Knowledge can use the link lists that are built into each Object, Attribute, and Method to data mine the tree for facts and ideas.

Axiom 6: Simplicity is the most important attribute of the Tree of Knowledge.

Forth enables the knowledge engineer to represent knowledge in the most simple form possible. For example, the following code snippet is how a Tree of Knowledge for the problem domain of a Company Intranet is defined:

    kb: root company                                \ The "Root" object
      company assets
        assets hardware
          hardware computer
            attribute manufacturer Manufacturer     \ Attributes require a
            attribute processor Processor           \ prompting string along
            attribute speed Processor Speed (MHz)   \ with a name.
            attribute ram Main Memory (MB)
            attribute storage Disk Capacity (MB)
            attribute os Operating System
            computer della                          \ The first instance
              manufacturer Dell                     \ object!
              processor Intel Pentium
              speed 100
              ram 32
              storage 1024
              os RedHat Linux 5.1
            computer royal                          \ Notice that the KB
              manufacturer Custom made              \ doesn't "know" this
              processor Intel Pentium II            \ computer's speed!
              ram 128
              storage 4096
              os Windows-NT 4.0 / sp4
          hardware airplane
            attribute manufacturer Manufacturer
            attribute weight Empty Weight
            attribute payload Payload Capacity (Cubic Feet)
            attribute fuel Fuel Tank Capacity (Cubic Feet)
            airplane jet
              jet b747                              \ Actual instance
              jet b727                              \ objects! (Notice that
              jet b17                               \ none of their
            airplane prop                           \ attributes are
              prop p1019                            \ known at this time.)
              prop p819
        assets middleware
          middleware routers
          middleware hubs
        assets software
          software knowledgebases
          software pc-applications
      company events
        events news
        events dinner
        events picnic
        events maintenance
      company finances
        finances investment
        finances statement
      company operations
        operations report
        operations proceedure
        operations alert
      company people
        attribute firstname First Name              \ Attributes that all
        attribute middlename Middle Name            \ people have.
        attribute lastname Last Name
        attribute ssn Social Security Number
        attribute payrate Rate of Pay (Hourly)
        attribute department Department
        people associate
          associate jblow                           \ Notice how the actual
            firstname Joe                           \ people instances have
            lastname Blow                           \ been grouped by a job
            payrate 6.25                            \ class. Also notice what
          associate jdoe                            \ the system DOES NOT
            firstname Jane                          \ KNOW about each one
            lastname Doe                            \ based on the attribute
            department Human Resources              \ list given above!
        people client
        people executive
          executive bzimmerly
            firstname Bill
            lastname Zimmerly
        people manager
        people prospect
        people vendor
    ;kb
    

The above "Tree of Knowledge" demonstrates that each unique word is a recursive defining word that can create an endless branch of ever-increasing detail. More about this tree will be considered in subsequent chapters, but the most important thing about showing this sample tree here is to illustrate the utter simplicity of how Forth allows such trees to be defined.

And YES, that is executable code that is illustrated here!

The Forth knowledge base compiler code that I have already created can quite easily compile the code listed above and that code can be queried for information.

Part 7: What Components are there in a Knowledge Based Tool?

The three main components of an expert system are (1) the knowledge base, 2) the inference engine and (3) the user interface.

The knowledge base is the "container" where the knowledge is stored.

The inference engine is the mechanism used to emulate human reasoning based on the knowledge held in the knowledge base.

The user interface allows the user to interact with the system.

One of the ways knowledge may be represented is by means of rules, which are specially suitable for semantic knowledge. But, how can we translate the expert's knowledge into rules?

There is no unique answer. The problem of knowledge elicitation has been referred to as the knowledge engineering bottleneck, due to it being considered the heavy load that prevents knowledge engineering and expert systems to have a definitive and successful take-off.


Appendix 1: Tips & Random Thoughts To Be Merged, Kept in Mind, or Coded into the KB.SRC or OBJECT.SRC files.

  1. Movable objects and attributes allow the KE* to "play with" the KB* structure. This is a very good and necessary kind of functionality.

  2. A new type of OBJECT is necessary to avoid confusion and add better structure to the KB Tree: the "OF" type (Object Folder).

  3. Attributes can be of many kinds although in the beginning I only have one kind: the text input field. To solve the problem of handling unknowns, the only other types are objects and object lists! (Necessary for ACLs*, or matching a vehicle to a scheduled flight.)

    Since Forth can create an ATTRIBUTE to have any kind of behavior, I will need to introduce two new pointers within the attribute structure. These pointers will be to executable code and will represent an INPUT TYPE and OUTPUT TYPE.

    Executing the INPUT TYPE will be done whenever a special form of INPUT is needed, such as that from a potentiometer, a random number generator, or some other such thing.

    Executing the OUTPUT TYPE will be done whenever a special form of output must be produced by the attribute.

  4. Specifically, my KB Interpreter, the "Tree of Knowledge" has the following simple design:

    An object has one behavior when it is invoked; create a child object. This is how knowledge is automatically labeled and ordered.

    An attribute does much more than this! It attached to the current object a new word which provides the ability to tag object instances with the ability to get, hold, and return data associated with that object instance.

Appendix 2: TODO(s) for KB and zHTTP

(Note: x - means that the task has been done and the functionality has moved from the theoretical to the real world! The date represents the date when the TODO was added to this list.)

06/30/1999

x - Add the ability to add new objects with the KB Object Editor.

x - Add the ability to add new attributes with the KB Object Editor.

x - Add the ability to update the attributes of a current object instance with actual data with the KB Object Editor.

Replace the "Object.NET" and "Company.NET" ways of starting ".ZKB" files and instead invoke them as HTML, NET, GIF, and JPG files are! For example, our Intranet should come up whenever someone browses the "http://iidbs.com/iidbs.zkb" URL! In order to accomplish this, the code should examine the root object for the "START" attribute. If it exists, it specifies the program to handle it, but if not, it just starts the Knowledge Base Object Editor!

Furthermore, remove the "Create New Knowledgebase" from the KBOE and instead have it automatically authorize and create a new KB when asked to access one that doesn't yet exist! (Doug may want to play with "http://iidbs.com/parts.zkb", and when the system realizes that no such file exists, it replies by authorizing Doug and creates the KB!)

Add the ability to delete object branches with the KB Object Editor.

Add the ability to delete attributes with the KB Object Editor.

Add the ability to move object branches with the KB Object Editor.

Add the "OF" (object folder) characteristic to object creation. This makes things like the CLIENT object look better because, although it was a child of PEOPLE, and the parent of actual instances of human beings, it itself is not a person and showing person attributes can be confusing.

Add the ability to specify ATTRIBUTE TYPES. (OBJECT and OBJECT LISTS might be all that is necessary to accomplish this, and would be the elegant way to do it.)

Appendix 3: Glossary

Access Control Lists ("ACLs") Attributes of an object that list all people who have the rights to access, create, delete, edit, view, etc., the object.

Ambiguity The condition of words having more than one meaning. Proper evaluation of ambiguous words require lexical analysis of the context in which the words are used.

Attribute An attribute of a class is something that all class members possess. For example, one of the attributes of "MAN" is "FIRST NAME". Thus, whenever a new instance of the class man becomes known to the system, it knows that this instance must have a first name and will therefore prompt for it.

Class Any parent object can also be called a class.

Deduction Reasoning from the general to the specific. For example, since all men have a first name, then Bill must have a first name. Deduction is the natural process followed by the inference engine employed in my knowledge base systems.

Facts The association of a specific attribute to an object. For example, it is a fact that BILL has a FIRST NAME because he is of the class "MAN", and all men have a FIRST NAME.

Ideas The association of a specific action sequence to an object. Ideas therefore, give life to facts. (Re: Method)

Induction Reasoning from the specific to the general. (Re: Mathematical Induction)

Inheritance The process of having child objects instanciate with all of the attributes of all of their parents.

Instance A child object.

Knowledge An ordered collection of related facts and ideas.

Knowledge Base ("KB") Knowledge that has already been collected, collated, and stored in a computerized data structure.

Knowledge Engineer ("KE") A software engineer skilled in capturing, representing, and manipulating knowledge into computerized knowledge bases.

Knowledge Engineering ("KE") The art and science of capturing, representing, and manipulating knowledge into computerized knowledge bases.

Method Executable code that is associated with an object. It can either be specifically associated or inherited like an attribute. An idea is a synonym of method.

Object Any class or instance of a class that forms a named node on the tree of knowledge.

Polymorphism Another way of describing the ambiguous nature of objects and how to deal with it.

Tree The natural way of representing hierarchies, which are in turn, the natural way of representing knowledge!

zHTTP A computer program that functions as a server for World Wide Web hyperlink pages on the Internet. It was designed and written by Bill Zimmerly of IIDBS.

Zimmerly Knowledge Base ("ZKB") A specific implimentation design of a computerized knowledge base, designed and written by Bill Zimmerly of IIDBS.