The Hackerlab at regexps.com

arch Project Inventories

up: arch
next: arch Patch Sets
prev: arch Project Trees

In a project tree, some of the files and directories are "part of the source" -- they are of interest to arch . Other files and directories may be scratch files, editor back-up files, and temporary or intermediate files generated by programs. Those other files should be ignored by most arch commands.

This chapter discusses how arch recognizes which files to pay attention to, and which to ignore.

arch has flexible facilities for keeping track of all of the files and directories in your project: for taking "inventories" of your project tree. It has these facilities for three reasons:

Distinguishing Source arch uses a project inventory to distingiuish files and directories which are part of your project from other files and directories which are temporary files, scratch files, editor backup-files, and so forth.

Additionally, arch permits you to overlay projects: store more than one project at a single root. When you do that, arch uses inventories to sort out which files and directories belong to each project. (The topic of overlays, however, is deferred until a later chapter.)

Recognizing Renames Every file or directory in an arch inventory has two names. One name is simply the location (path) of the file relative to the root of the project tree. The other name is a "logical name" for the file: a name that remains the same regardless of where in the project tree the file is located. When arch compares two versions of a project tree, it uses logical names to discover when files or directories have been moved, renamed, deleted, or added.

Canonical Inventories Finally, arch permits you to make a record of the "canonical inventory" of your project -- all of the files that you believe are supposed to be there. arch can then tell you whether any files are missing or have been added compared to the canonical inventory.


Choices Regarding Inventories

up: arch Project Inventories
next: Specifying a Tagging Method

For each project tree, you have a choice to make regarding how project inventories work. The options are described briefly here, then in more detail in the sections that follow.

Naming Conventions The simplest (and default) option is to simply use naming conventions . arch will search your tree for files matching certain naming patterns, and consider all of those files to be source files.

When you use only naming conventions to take an inventory, the logical name of a file and its location name are exactly the same. For that reason, if you rename a file, arch will think you deleted a file with the old name, and added a file with the new name. If you delete a file, then add a file with the same name, arch will think that the new file is a modified form of the old file. None of those limitations are fatal, arch will still work, but they do limit the effectiveness of arch at branching and merging. ("Branching" and "merging" are topics of a later chapter.)

Explicit Inventories Another option is to use an explicit inventory . Once again, arch will search for files that satisfy certain naming conventions -- but not every such file or directory is automatically source. Instead, whenever you add, delete or renamed a file, you must inform arch of that fact explicitly. For example, after adding the file foo.c , you have to tell arch :

        % larch add foo.c

and if you rename foo.c to bar.c , then you must also tell arch :

        % larch move foo.c bar.c

Implicit Inventories A third option combines some of the advantages of using naming conventions with some of the advantages of explicit inventories: implicit inventories . When you use an implicit inventory, every file that passes the naming conventions is considered source. You may explicitly add, delete, and rename files -- allowing arch to precisely track renames for those files and directories. You also may store a file tag (the "logical name" of a file) in any file. If you don't explicitly tag a file, and use an implicit inventory, arch will search for those embedded tags and use them to precisely detect new files, deleted files, and renamed files.

Each of the three options is called a tagging method .

There is some advice at the end of this chapter about how to choose among the three tagging methods.


Specifying a Tagging Method

up: arch Project Inventories
next: The inventory Command
prev: Choices Regarding Inventories

If you never explicitly specify a tagging method, arch will use simple naming conventions, by default. You can also make explicit your choice to use only naming conventions with this command issued in a project tree:

        % larch tagging-method names

Similarly, to use either an implicit or explicit inventory, use one of the commands:

        % larch tagging-method explicit
        % larch tagging-method implicit

To find out what method a given project tree uses, use the same command with no argument:

        % larch tagging-method
        names


The inventory Command

up: arch Project Inventories
next: Using an Explicit Inventory
prev: Specifying a Tagging Method

The command larch inventory is used to print a list of source files. It has many options, including options to print other kinds of file lists (such as a list of all editor backup files, or a list of all files which are not source):

        % cd source-tree

        % larch inventory --source
        hello.c
        hello.h
        library
        library/buffer.c
        library/buffer.h
        ...

contrasted with:

        % cd source-tree

        % ls
        hello.c   hello.c.~1~   hello.h  library

(Notice that hello.c.~1~ is not included in the inventory of source files.)

The naming conventions used by arch are as follows:

Control Files A control file is part of the source, but control files are not included in the output of larch inventory unless the --all flag is used. Control file and directory names match any of these patterns:

        .arch-project-tree
        .arch-ids
        .owned.*
        .common
        {arch}

Junk Files A junk file is not part of the source. A junk file or directory name matches the pattern:

        ,*

or if it contains any of the characters:

        <space>
        <tab>
        <newline>
        [
        ]

        ?
        \

Note that if a directory name matches that pattern, then none of the contents of the directory are part of the source, regardless of their names.

Junk files are listed by the command:

        % larch inventory --junk

Arch sometimes creates junk files and directories of its own. When it does, those files and directories have names that match the pattern:

        ,,*

You should avoid creating files and directories with names that match that pattern. arch will freely delete files and directories with names that match ,,* whenever it needs to re-use such a name.

Usually, arch will delete any junk file it creates before the command that created the junk file terminates. Sometimes, though, when a command fails, arch will leave behind junk files or directories matching ,,* . This is a debugging feature, likely to be removed in a future release. For now, whenever you find such a file (and are confident it isn't being used by a currently running command), you are free to delete it.

Backup Files If a file is not a junk file, it may be a backup file . Backup files are not part of the source. They match any of the patterns:

      *~
      *.bak
      *.modified
      *.orig
      *.original
      *.rej
      *.rejects

Backup files are listed by the command:

        % larch inventory --backups

Precious Files If a file is not a control file, junk file, or backup file, it might be a precious file . Precious files are not part of the source, but arch does sometimes treat them specially. For example, when arch copies a directory of source for you, it copies not only the source files, but the precious files as well.

Precious files and directories match one of these patterns:

        +*
        .gdbinit
        =build*
        =install*
        CVS
        RCS
        TAGS

Of course, precious files can be listed by the command:

        % larch inventory --precious

Sometimes arch will create its own precious files -- usually to save some information that you might not want to lose. When it does, it creates a file or directory matching the pattern:

        ++*

You should avoid creating such filenames yourself. arch won't every delete such a file -- but if one happens to get in the way of an arch command, that command will fail with an error.

Source Files If a file is not a control, junk, backup, or precious file, it might be an ordinary source file . Source files are, of course, the files that arch stores in an archive (along with control files).

Source files must match the pattern:

        [=a-zA-Z0-9]*

but must not match any of the patterns:

      *.o
      *.core
      core

Ordinary source files are listed by:

        % larch inventory --source

Some files which are arch control files are counted as source even though they don't match the patterns above. However, these files are not listed by default. All source files (ordinary source plus control files) are listed by:

        % larch inventory --source --all

Unrecognized Files Any file that doesn't fall into the above categories is an unrecognized file . Unrecognized files can be listed by the command:

       % larch inventory --unrecognized

WARNING The basic pattern for source files is:

        [=a-zA-Z]*

however, you should restrict yourself to file names that do not contain spaces. Filenames containing spaces are likely to trigger bugs in the current release of arch .


Using an Explicit Inventory

up: arch Project Inventories
next: Using an Implicit Inventory
prev: The inventory Command

If you want to use explicit designation of source filess, rather than naming conventions alone, then use this command:

        % larch tagging-method explicit

Note that you must use that command from within a working directory tree that has already been initialized using init .

When using explicit designation, it is (ordinarilly) necessary to add every file and directory in the source to the explicit list using the command:

        % larch add FILE

If FILE is a directory, that will create FILE/.arch_ids/=id . If it is a regular file or symbolic link, it will create (in the same directory) .arch_ids/FILE.id . In either case, the file created will contain an obscure string known as an "inventory tag" (inventory tags are explained in more detail below).

If you remove a regular file or symbolic link, you must use the command:

        % larch delete FILE

That won't remove FILE itself, but it will remove the inventory tag for FILE .

In order to remove a directory, you must yourself remove the .arch_ids subdirectory. That will also implicitly remove the inventory tags of any files that arch thinks are stored in that directory.

If you rename a regular file or symbolic link, you can use the command:

        % larch move OLD-NAME NEW-NAME

to move the inventory tag for that file.

If you rename a directory, it's inventory tag (and the tags for all files and subdirectories it contains) move with it automatically (because the .arch_ids subdirectory has moved).

When you run larch inventory in a working directory using explicit designation, only explicitly designated source files are listed. If you would rather see a list of all files passing the naming conventions for source files, use:

        % larch inventory --source --names

You should also read about tree-lint later in this chapter.


Using an Implicit Inventory

up: arch Project Inventories
next: Recognizing Renames -- Inventory Tags
prev: Using an Explicit Inventory

To use implicit tagging, use the following command in your working directory:

        % larch tagging-method implicit

When implicit tagging is used, every file that passes the naming conventions is treated as source. If a file or directory has an explicit tag (created with add ), arch will use that explicit tag to recognize when a file has moved. If a file (but not a directory or symbolic link) lacks an explicit tag, arch will look for a tag in the file itself.

A tag within a file has one of two forms. It may be either:

        <punct><basename><spaces>-<spaces><tag>

where <punct> is an arbitrary string of punctuation and spaces, <basename> is the basename of the file, and <tag> an inventory tag for the file. Or:

        <punct>tag:<spaces><tag>

In either case, <tag> should be unique among the files within a directory. A tag within a file must occur within the first 1024 bytes of the file.

A handy convention for source files is to add a comment to the top of every file, briefly stating the purpose of the file:

        /* hello.c - `main' for the hello world program
...

or:

        /* tag: `main' for the hello world program
...

Another possible convention is to use a string identifying the author and the time the file was first created (or first tagged):

        /* tag: joe.hacker@gnu.org Thu Nov 29 17:25:15 PST 2001
...

If you use the basename form of an implicit tag, and actually rename a file (rather than simply move it between directories), you do need to remember to update the tag line to reflect the new basename.

When you use implicit tagging, it is ok if a file lacks any tag at all, either explicit or implicit. In that case, if you rename the file, arch will think you've deleted the old file and added a new one -- but aside from that, everything will work normally.

CAUTION: Leading and trailing spaces around an inventory tag are not considered part of the tag. Within a tag, every non-graphical character is replaced by _ . For example, you write the that tag:

        `main' for the hello    world program

the actual inventory tag is:

        `main'_for_the_hello____world_program

It is possible that a future release of arch will slightly change the rule -- so that multiple spaces and tabs are replaced by a single _ .


Recognizing Renames -- Inventory Tags

up: arch Project Inventories
next: Keeping Things Neat and Tidy
prev: Using an Implicit Inventory

If you are using naming conventions only to recognize source files, then if you rename a directory or file, arch will conclude that you have deleted the old file, and created a new file.

If you are using an explicit source inventory, arch will always recognize when a directory is renamed (presuming that the .arch_ids subdirectory is preserved), and it will recognize when a file is renamed if you use move (rather than delete and add ). Of course, arch can be fooled if you swap two files without swapping their inventory tags.

If you are using an implicit inventory, arch will never recognize when an untagged file is renamed (it will think "delete" and "add"). If a file is tagged explicitly, arch will recognize when the file is added, deleted, or renamed -- just as when using an explicit inventory. If a file is not tagged explicitly, but has an embedded tag, arch will recognize when the file is added, deleted or moved.


Keeping Things Neat and Tidy

up: arch Project Inventories
next: Avoiding Accidental Ommissions and Additions
prev: Recognizing Renames -- Inventory Tags

The command:

        % larch tree-lint

is useful for keeping things neat and tidy.

If you use explicit tagging, it will tell you of any tags for which the corresponding file does not exist. It will tell you of any files that pass the naming conventions, but for which no explicit tag exists.

If you use implicit tagging, it will tell you of any files for which no tag can be found -- either explicit or implicit. It will tell you of any explicit tags for which the corresponding file does not exist.

In either case, or if you are using naming conventions only, tree-lint will tell you of any files that don't fit the naming conventions at all.

Finally, if you use explicit or implicit tagging, tree-lint will check for cases where multiple files use the same tag. If any two files do have the same tag, you must correct that, either by editting the tag (if it is in the file itself) or by using delete and add to replace a duplicated explicit tag.


Avoiding Accidental Ommissions and Additions

up: arch Project Inventories
next: The Inventory Tag Abstraction in Detail
prev: Keeping Things Neat and Tidy

A manifest is an explicit list of the files you believe are supposed to be in a project tree. arch allows you to maintain a manifest, and to compare it to the actual contents of a tree.

The command set-manifest sets the manifest to the current contents of the project tree:

        % larch set-manifest

Note that only regular source files, not arch control files, are included in the manifest. To replace an existing manifest, you must provide the -f flag (or --force ) to set-manifest .

You can retrieve the manifest with:

        % larch manifest

Each line of the manifest is of the form:

        <path>\t<tag>

and the list is sorted by the <tag> field.

You can look for missing, added, or renamed files with:

        % larch check-manifest

which will compare the project tree inventory to the manifest and print a report describing divergences (if there are any).


The Inventory Tag Abstraction in Detail

up: arch Project Inventories
next: A Warning About Changing Tagging Methods
prev: Avoiding Accidental Ommissions and Additions

When arch considers the files and directories in a working directory it builds a one-to-one index mapping path names (relative to the root of the working directory tree) to inventory tags .

The inventory tag of a file is its "logical identity". The path is the position of that identity within the particular working dir.

You can see the inventory tag for each source file with the command:

        % larch inventory --source --tags

When arch compares two project trees, it bases the comparison on logical identities. If both trees have a file with a particular inventory tag, but the files are in different positions, then arch considers the file to have been moved or renamed. Similarly, if an inventory tag is present in one tree, but missing in the other, then arch considers the file to have been added or deleted.

If you use naming conventions only, the inventory tag of each file is the same as its path. Thus, when using the names tagging method, arch never recognizes that a file has been moved or renamed.

When you use the explicit tagging method, inventory tags are stored in the .arch-ids directories. There is a file in .arch-ids for each tagged file (and one file for the directory containing .arch-ids ), and those files contain the tags.

When you use the implicit tagging method, tags in .arch-ids directories take precedence (if they exist). If a file is not explicitly tagged, arch searches for the inventory tag in the file itself (as described earlier in the chapter). Finally, if a file is not tagged at all, then its path is used as the inventory tag.


A Warning About Changing Tagging Methods

up: arch Project Inventories
next: Other Ways to Tag Files
prev: The Inventory Tag Abstraction in Detail

Be cautious when changing tagging methods for directories already checked-in to an arch revision control archive.

For example, if you change from the tagging method names to explicit , then the inventory tag for every file will change. arch will think that you've deleted all of the files in the old tree, and added all of the files in the new tree.

However, there is a work-around for this problem, described in a later chapter.


Other Ways to Tag Files

up: arch Project Inventories
next: Telling tree-lint to Shut Up
prev: A Warning About Changing Tagging Methods

In some situations, it isn't convenient to explicitly tag every file or to add an implicit tag to every file.

You can supply a default tag for every file that doesn't have an explicit tag with the command:

        % larch explicit-default TAG-PREFIX

After that, every file in that directory which lacks an explicit tag will have the tag:

        TAG-PREFIX__BASENAME

where BASENAME is the basename of the file. Default tags created in this way take precedence over implicit tags embedded in files. You can find out the default tag for a directory with:

        % larch explicit-default
        TAG-PREFIX

and remove the default with:

        % larch explicit-default --delete

You can also specify a default tag which has lower precedence than implicit tags:

        % larch explicit-default --weak TAG-PREFIX

and view that default:

        % larch explicit-default --weak

or delete it:

        % larch explicit-default --weak --delete


Telling tree-lint to Shut Up

up: arch Project Inventories
next: Which Tagging Method Should You Use?
prev: Other Ways to Tag Files

When using implicit tags, you may sometimes have a directory with many files that have no tag (either explicit or implicit), but not want those files to appear in a report of untagged files generated by tree-lint . There are two ways to tell tree-lint to shut-up about such files:

One is to provide a default explicit tag or weak default explicit tag using larch explicit-default , as described above.

The second method is to label the directory as "don't care" directory -- which means that tree-lint shouldn't complain about untagged files. You can do that with:

        % larch explicit-default --dont-care set

or remove the "don't care" flag with:

        % larch explicit-default --delete --dont-care

You can find out whether the "don't care" flag is set in a given directory with:

        % larch explicit-default --dont-care


Which Tagging Method Should You Use?

up: arch Project Inventories
next: Altering the Naming Conventions
prev: Telling tree-lint to Shut Up

Given the choice of the names , explicit , and implicit tagging conventions, which one should you choose?

The names method is best for project trees that you don't control, and for which the maintainer does not include file tags (either explicit or implicit). For such trees, the names method will always work, but if you want to use the explicit or implicit method, you'll have to add file tags yourself.

The implicit method is, in my opinion, by far the most convenient. It is easy to get in the habit of adding a tag: line to the bottom of each new file and doing a single larch add for each directory. After those steps, you can rename files and directories freely -- without having to remember to tell arch in a separate command.

On the other hand, the implicit method has two limitations. One limitation is that you must accept the possibility of accidently adding new files to the inventory. Any file you create that passes the naming conventions counts as source. The other, closely related, limitation is that if you use implicit inventories, you will never want to compile a program in its own source directory. When you compile a program, that creates intermediate files and executables. Many of those files will almost certainly pass the naming conventions for source -- so arch will wrongly include them in a source inventory. I use the implicit method, but my configure scripts have a safeguard that causes them to refuse to compile my programs in the source tree.

Finally, the explicit method is the only choice left if you want the benefits of real file tags (therefore you can't use the names method) but either insist on compiling in the source tree or can't risk accidently adding the occasional unintended file (so you shouldn't use implicit ).


Altering the Naming Conventions

up: arch Project Inventories
prev: Which Tagging Method Should You Use?

Note: this is a relatively new feature, so the documentation is not yet well integrated with the rest of the manual.

The file {arch}/=tagging-method defines the naming conventiosn used for a particular project tree. By editting that file, you can estalish naming conventions that are different from the defaults, which are described above.

That file can contain blank lines and comments (lines beginning with # ) and directives, one per line. The permissable directives are:

        implicit
        explicit
        names
                specify the tagging method to use for this tree

        exclude RE
        junk RE
        backup RE
        precious RE
        unrecognized RE
        source RE
                specify a regular expression to use for the indicated
                category of files.

Regular expressions are specified in Posix ERE syntax (the same syntax used by egrep , grep -E , and awk ) and have default values which implement the naming conventions described above.

The exclude pattern should match a subset of files matched by the source pattern. Files which match exclude are printed by:

        % larch inventory --source --all

but not printed by:

        % larch inventory --source

Although you can define your own naming conventions, there are some minor limitations:

The file names . and .. are always ignored by inventory .

File names which contain non-printing characters, spaces, or any of the globbing characters (* , [ , ] , \ , ? ) are always placed in the category unrecognized . This is so that tools which operate on project trees can safely presume that no source file has a name that includes these characters.

File names which begin with ,, are always placed in the category junk . This is so that tools which operate on a project tree can safely destroy or create files beginning with ,, .

The default naming conventions are given by:

       exclude ^(.arch-ids|\{arch\})$
       junk ^(,.*)$
       backup ^.*(~|\.~[0-9]+~|\.bak|\.orig|\.rej|\.original|\.modified|\.reject)$
       precious ^(\+.*|\.gdbinit|=build\.*|=install\.*|CVS|CVS\.adm|RCS|RCSLOG|SCCS|TAGS)$
       unrecognized ^(.*\.(o|a|so|core)|core)$
       source ^([_=a-zA-Z0-9].*|\.arch-ids|\{arch\}|\.arch-project-tree)$

arch: The arch Revision Control System
The Hackerlab at regexps.com