Previous Next Contents

1. Foreword and Introduction

1.1 Copyright

NoSQL RDBMS, Copyright (C) 1998 Carlo Strozzi, with permission from the original RDB author, W.Hobbs.

This program comes with ABSOLUTELY NO WARRANTY; for details refer to the GNU General Public License.

A copy of the GNU General Public License is included in the appendix, at the end of this document.

1.2 Preface

This working draft describes, and provides instructions for the use of, NoSQL (I personally like to pronounce it noseequel), a close derivative of the RDB DataBase system. The original RDB system was (and still is) developed at RAND Organization by Walter V. Hobbs. Most of the NoSQL code, as well as the text of this document, have been taken directly from RDB, so most of the credit for it goes to the original author.

NoSQL uses exactly the same table format as RDB, and therefore tables are called 'rdbtables' also in the NoSQL context.

NoSQL's major differences over the original code are:

Other major contributors to the original RDB system, besides the author, were:

Chuck Bush

Don Emerson

Judy Lender

Roy Gates Rae Starr

People who helped with turning RDB into NoSQL:

Vincenzo (Vicky) Belloli

David Frey

1.3 Introduction

A good question one could ask is "With all the relational database management systems available today, why do we need another one ?" There are five reasons. They are:

  1. NoSQL is easy to use by non-computer people. The concept is straight forward and logical. To select rows of data, the 'nsq-row' operator is used; to select columns of data, the 'nsq-col' operator is used.
  2. The data is highly portable to and from other types of machines, like Macintoshes or MSDOS computers.
  3. The system will run on any UNIX machine (that has the PERL Programming Language).
  4. NoSQL essentially has no arbitrary limits, and can work where INGRES can't. For example there is no limit on data field size, the number of columns, or file size (the number of columns in a table may actually be limited to 32.768 by some implementations of the AWK interpreter, including mawk).

The data is contained in regular UNIX ASCII files, and so can be manipulated by regular UNIX utilities, e.g. ls, wc, mv, cp, cat, more, less, editors like 'vi', head, RCS, etc.

The form of each file of data is that of a relation, or table, with rows and columns of information.

To extract information, a file of data is fed to one or more "operators" via the UNIX Input/Output redirection mechanism.

There are also programs to generate reports, and to generate, modify, and validate the data. A more through discussion of why this type of relational database structure makes sense is found in the book, "UNIX Relational Database Management", Reference #2.

It is assumed that the reader has at least a minimum knowledge of the UNIX Operating System, including knowledge of Input/Outout redirection (e.g., STDIN, STDOUT, pipes).

This document presents information in the following order: The DATA section describes the structure of the data, with examples. There is a general discussion about operators in the section on OPERATORS, followed by several sub-sections, one for each operator in alphabetic order. Each has detailed instructions for use, and examples. There are sections describing selection of information using multiple operators, producing reports, and generating new rdbtables (data files in NoSQL format).

1.4 Perl and the Operator/Stream Paradigm.

As stated in the Abstract, NoSQL uses the Operator/Stream DBMS Paradigm. The main reason why I decided to turn the original RDB system into NoSQL is that the former is entirely written in Perl. Perl is a good programming language for writing self-contained programs, but Perl's pre-compilation phase and long start-up time are worth paying only if once the program has loaded it can do everything in one go. This contrasts sharply with the Operator/Stream model, where operators are chained together in pipelines of two, three or more programs. The overhead associated with initializing Perl at every stage of the pipeline makes pipelining somewhat inefficient. A better way of manipulating structured ASCII files is using the AWK programming language, which is much smaller than Perl, is more specialized for this task and very fast at startup (on my Pentium II Linux /usr/bin/mawk (POSIX AWK) is just 99K. Perl 5 is almost 500K. You get the point). The drawbacks are that AWK is weak at manipulating command-line arguments and options. I have therefore taken what I think is a good compromise: a compiled 'wrapper' program, written in C, that parses the command line and then calls AWK. This has proven very effective. At the moment I have developed a small set of extra operators designed this way, that cover the most common functions already performed by the original Perl code (which remains available, and which is quite good for interactive use at the UNIX shell prompt anyway).

These faster operators offer a less user-friendly command line sintax and possibly different command line options than the Perl ones, but they are meant to be used from inside other programs, like WWW CGI scripts, where speed does matter and user-friendliness doesn't that much. These new operators have usually the same name as the original ones, but with an "f" (for "fast") in their name. So, for instance, the faster version of nsq-row is nsq-frow, and so on.


Previous Next Contents