INTRODUCTION

TBX (TermBase eXchange) is a family of XML-based languages for
the interchange of terminological information (called TMLs, for
Terminological Markup Language; also informally called "dialects"
of TBX). All of TBX shares a core structure, in which information is
represented on one of three structural levels: concept, language, and
term. Concept entries contain language entries, which in turn contain
entries for individual terms. The core structure also provides a set of
generic elements for attaching descriptive and administrative information
to these entries. These generic elements can be employed differently in
different TMLs.

TBX-Glossary is one such TML, designed to support the interchange of
glossary data among several formats: UTX-Simple, GlossML, the TBX family,
and OLIF. Its expressive capacities are intentionally limited; it is
designed to express only such essential data as can be unambiguously
represented in all of these formats. This design goal is the main point
differentiating TBX-Glossary from other standard TMLs such as TBX-Basic
(intended to serve the most common needs in localization) or TBX-Default
(intended to provide a broad array of terminological data categories
taken from ISO 12620).

The convert_glossary program performs this interchange by converting
glossary files among these various formats.

INSTALLING

The converter is written in Perl, and you will need to have a Perl
interpreter to run it. If you are not certain, enter this command line:

perl -v

You will either see a message stating the version of Perl on your system
(we require 5.8 or greater), or an error message (in which case you do
not have Perl). Windows users can download Perl from strawberryperl.com --
the download is a .msi file which will install Perl.

In addition to Perl itself, the converter depends on a program module
called XML::Rules, which you will need to have installed. To find out
whether you have it, enter this:

perl -MXML::Rules -e 1

You will either see a new prompt immediately, or an error
message. As before, the error message means you need to install
something. Fortunately, Perl makes this easy. Enter

cpan

and then, at the next prompt,

install XML::Rules

The program may ask you for permission to download XML::Rules from the
Internet. Enter

yes

and, when the installation is finished, enter

quit

to leave the installer program.

(Users of ActiveState Perl will probably have to install XML::Rules via
ActiveState's Perl Package Manager, rather than via CPAN.)

The convert_glossary program itself can be placed wherever you find it
convenient. This completes your software installation.

RUNNING

The command line for this program is of the form

perl convert_glossary input_file output_file

where input_file is the name of your existing file and output_file is
where you would like the conversion stored. For example,

perl convert_glossary data.utx output.tbx
perl convert_glossary data.olf output.gml

The program inspects the three-letter suffixes of the filenames to
determine which conversions to run: .utx, .gml, .tbx, and .olf suffixes
work as expected, while .txt activates our quick-input variant of UTX (for
input only -- you cannot output it). Suffixes are case-insensitive and,
indeed, need not be separated with a dot. If all goes well, the program
will run and exit without printing any messages, and you can proceed
to inspect the output file. Messages from the program indicate errors,
such as input that cannot be converted.

A word of caution: If you supply the name of an existing file for
output_file, the program will overwrite it without hesitation.

You may wish to experiment with the program without creating a multitude
of output files. If your command shell can handle UTF-8, you can view
the converter output directly on-screen, by providing an output filename
that is only a suffix. For example,

perl -C7 convert_glossary data.gml .tbx

will convert a GlossML file to TBX and display it without saving it. (Note
the -C7 flag. This simply instructs Perl to ensure that the output is
proper UTF-8 before handing it back.)

CONVERTIBLE INPUTS

Each of the four formats can represent kinds of data that some of the
other three cannot. Therefore, not every file in one of these formats can
be converted to another format. Moreover, most of the formats require
at least one kind of data that another format does not require. To be
fully convertible among formats, a file must contain all data that one
of the formats may require, and it must not contain any data that one
of the formats cannot represent.

The convertible data categories are as follows, where 'src' means 'in
the source language' and 'tgt' means 'in the target language'.

Glossary-wide
	mandatory
		source and target language
		subject field
	optional
		glossary note
Per-entry
	mandatory
		src and tgt term
		src and tgt part of speech
	optional
		src note
		src and/or tgt definition
		src and/or tgt definition source citation
		src and/or tgt contextual example
		src and/or tgt contextual example source citation

Data placed in these categories should be in plain text, without XML-like
markup or tab characters. For details on how these data categories
are represented in each format, see the corresponding file in the
Convertibility directory.

If the input file violates these requirements, the converter program
will emit a warning. It may then stop the conversion process, or it
may proceed with a best-effort attempt, so production of an output file
should not be taken as evidence of success: The only such evidence is
freedom from warnings.

(By "kinds of data" above we mean both data categories and broader,
structural qualities of the glossary. The four formats embody different
models of what a glossary is, and conversion requires common ground on
these modeling concerns just as it requires agreement on required and
permitted data categories. Thus the seemingly vague phrase.)
