global menu:
GENERAL TEXT FILTERS/ FORMATTERS
Listed below are both "all-in-one" multi-filter programs and packages containing multiple, single-purpose filters.
LM- Multi-purpose text file formatter, search-replace, and more.
* * * * 1/2
[added 1997 updated 10-11-00]LM is described as a text file "line manipulator," yet this description sorely underestimates its uses. On first inspection, the command line syntax of this program will likely be unintelligible to most casual users- and the included documentation may be difficult to decipher. LM's syntax does not resemble the "standard" syntax of most DOS programs. I include LM in this list because of its combination of small size (37K) and versatility to perform a wide variety of general filtering and formatting chores. Thankfully, some sample commands and batch files are provided which the novice user can easily modify to his or her needs. The option list is nearly endless (uses all 26 letters as options -upper and lower case- and then resorts to using symbols!).
A short list of LM's capabilities include:
Info for power users (from the documentation):
"The main operations supported are grip/non-grip, search/replace, synchronised line appendage from other files, input/output line selection by line numbers or passwords, spaces/empty lines absorption, filewise update or renaming, line width imposition and etc. Input lines can also be taken from only the command line. Long command parameters may be taken from files"
Author: Zhuhan Jiang, Australia (1996). Home Page.
10-11-00: v2.13 (06-2000) available. Source included.
download lm213.zip (117K)
AWK, GAWK, MAWK- Powerful text processor.
unrated [added 3-7-99 updated 09-19-00]
reviewed by Howard Schwartz 2-28-99
Unix comes standard with a set of programs that can do just about anything imaginable with a text (i.e., ASCII) file. In rough order of complexity they are:
By default, awk reads files, a line at a time, checks each line to see if it matches a pattern, and processes each matching line according to a script of commands. The pattern can be a word, phrase, regular expression, or complex expression. Commands are similar to the C programming language, and have the typical form:
/regular expression/ {one or more commands}
By default, awk keeps track of the line number of each line, counts the number of words in each line, and numbers the words so they can be referred to, like ``positional parameters'' in a DOS batch file. Thus, awk easily rearranges columns, or words in a line. For instance, the command:
/^[A-Z]/ {print $2, $1, $3}
will reverse the first and second words of each line whose first letter is a capital letter. This might, for instance, reverse the first and last names of lines that are part of an address book. Other features of awk that may be useful to general users:
There are quite a number of freeware versions of awk. Among the best are the GNU versions, usually called ``gawk''. They have been under revision and development for quite some time, and come with several very useful extensions. Gawk comes in both 16, and 32 bit versions for DOS.
Happily, GNU has put out a free awk manual, available in a number of formats, including html. This manual is both comprehensive and easy to understand. The typical ``man page'' that comes with awk (even gawk) appears to contain english words, but is all but a secret code to most ``non-Unix fluent'' human beings.
09-19-00: 1) v2gnu GAWK v3.06 (08-00, DJGPP) available. Bug fixes. 2) Gnuish GAWK 3.04 (02-00) available.
Pick an AWK variety to download:
sed- Versatile text filter.
A versatile, yet relatively compact text filter from the Unix world- See extended description.
FU- Multi-purpose text filter.
unrated [added 8-16-98]
Small (11K) yet versatile and simpler syntax than LM. Works as a filter, but can also use infile and outfile.
Usage: FU :option parameters ...
Select Lines :CHOP/:CHOPH str [n] Page
:COPY :DECTRL [HLD]* :PAGE [len [beg [end]]]
:DEROFF [chs] :NUM [width [swidth]] :VMARG [top [bot]]
:FIND/:FIND0 s :PRINTF [n] :ODD/:EVEN
:LINES [1st [last]] :PREFIX/:SUFFIX s :HEADER [s [lines]]
:NULL0 Change Spaces :FOOTER [s [lines]]
:UNIQUE [B] :DETAB [n] Misc.
:BEGSTR s [n] :JUST [LRC [rm]] :COUNT [@LWC]*
:ENDSTR s [n] :LEFT [n] :FILE [n1 [n2]]
:SURSTR s1 s2 [n] :RIGHT [n [ch]] :{ infiles
:OUTSTR s1 s2 [n] :STRIP/:STRIPH :} outfile
Remap Characters :TRUNC [col] :}} outfile
:ASCII :UNJUST [skip [col]] :TEE [fname [AW]]
:DEBOX [box_s] Columns If
:ENC [key] :COL [n1 [n2]] :BREAK
:LOWER/:UPPER :DELCOL [n1 [n2]] :IFIN/:IFOUT beg [end [inc]]
:TRANS chs [new_s] :ADDCOL [n1 [n2 [s]]] :IFSTR/:IFSTR0 str [inc]
Change Lines :FILCOL [n1 [n2 [s]]] :IFCHR/:IFCHR0 chs [inc]
:CHANGE s [new]
Limitations: Line length restriction of 255 characters. Notes: ?-The string option for HEADER and FOOTER seems to require quotation mark delimiters to output the string correctly (other options with strings don't seem to need delimiters. ) Documentation sparse.
Author: David Lo. (1990). Suggested by Robert Bull.
download fu.zip (18K)
Filter- Multipurpose text filter can also remove ANSI sequences.
unrated [updated 03-26-00]
Filter is a multi-purpose c-line text filter like LM. Fewer features but a more comprehensible syntax. Author: Bob Ferguson, Netherlands (2000).
usage : FILTER [[<]in] [>out] [/option[...]] [...]] [txtopt [...]]
option: C[n,s,d] Copy n characters from position s to d.
D[n,p] Delete n characters at position p.
E[+/-][n] Expand tabs ([+]) or replace spacegroups by tabs (-),
where n [8] is tab field length.
F[n,m] Fill nonblank lines with dots to width n [70],
skipping first m [0] lines. Implies /T.
H Send this help text to (redirected) output.
? Send this help text to screen (page by page)
I[n,p] Insert n spaces at position p.
J[+/-] Add/remove Carriage Return before Line Feed [+].
L[+/-] Add/remove Line Feed after Carriage Return [+].
M[n,s,d] Move n characters from position s to d.
N[n] Number lines, use n [4] digits,
O[n,s,d] Overwrite n chars from position s to d.
P Reset parity bit. Implied by /W.
R[n] Remove n trailing lines after processing /S and /X.
S[n,m] Skip m lines starting at line n.
T Trim trailing blanks. Implied by /F.
U[+/-] Convert to upper/lower case [+].
V[n,s] Reverse n [all] characters starting at position s.
W Wordstar document ==> ASCII textfile. Implies /P.
X[n,m] Extract m lines starting at line n.
Z[+] Remove NULLs. Z+: also ANSI screen control sequences.
txtopt: /A[+/-][I][p][*] text Include lines after specified text only.
/B[+/-][I][p][*] text Include lines before specified text only.
/G[I][p][*] text Include lines with the specified text only.
+ : Include matching line.
- : Do not include the matching line (this is the default).
I : Ignore upper/lower case.
p : Search for text starting at column p. Default p=1.
* : Text may be found at any column at or after p.
download filter40.zip (32K)
Txtutils (GNU)- Collection of Unix text utilities (DJGPP).
unrated [added 9-13-98 updated 01-26-00]
An all-in-one file text filters toolkit ported from the Unix world. Don't overlook any of these utilities- they may get the job done quicker and handle larger files than their 16-bit counterparts. Support Win9x long file names. Separately zipped documentation helpful. Require 386+ and a DPMI provider under plain DOS (cwsdpmi). (2000)
Concise summary of contents:
01-26-00: v2.0 (1-00) available."This release is the first one that supports DJGPP in the official GNU distribution."
download txt20b.zip (1.04MB)
Older (1995) 16-bit binaries and docs here:
ftp://gatekeeper.dec.com/pub/micro/pc/simtelnet/gnu/gnuish/dos_only/tut111ax.zip (452K)
TS Filters- Special purpose text filters.
* * * * [updated 08-18-00]
The following individual filters perform specialized tasks not easily accomplished with most text editors. Author: Timo Salmi, Finland (2000)
In package TSFLTCx...
In package TSFILT2x...
08-18-00: tsfltc22.zip available; DETAB.EXE fixed for fast PC's
download tsfltc22.zip (98K) and tsflt24.zip (127K)
Paginate- Format ASCII documents (tables/headers/footers/indent/wrap).
* * * [updated 07-05-99]
Paginate is one of the few programs I'll probably never use -but which I can still highly recommend to a specific audience. Paginate is best described as a comprehensive command line ascii document formatter. As it's names suggests, it can paginate a document for printing. But Paginate can also add page headers and footers, indent paragraphs, produce tables, wrap text at defined margins, and more. In order to generate a formatted document, one has to insert instruction codes within the document to be processed. Frankly, a word processor requires much less work and time for most tasks- and I suspect most home users will have little need for Paginate. But others will undoubtedly love it. Well designed for it's purpose. From Bruce Guthrie/ href="http://www.geocities.com/SiliconValley/Lakes/2414">Wayne Software
11-23-99: v911 released 11-99.
Get pagin911.zip from download page
CHARACTER TRANSLATION
1. FixText- Character translation - DOS/MAC/UNIX conversion - more.
* * * [updated 07-05-99]
Fixtext is a command line utility that performs two general functions. It can convert among DOS, UNIX and MAC text formats. It can also translate (replace) characters or strings within a text. For example, it can convert uppercase letters to lower case letters- or it can convert ASCII characters to their ANSI (Windows) equivalents. There is only one hitch to translation. The user must write and maintain separate translation tables. While this permits a great deal of flexibility and customization, it also requires effort to create and edit the translation tables (not difficult, but time consuming). A few specific operations such as trimming leading/trailing line spaces and expanding tabs are hard coded as command line switches. From Bruce Guthrie/ Wayne Software.
11-23-99: v911 released 11-99. See Wayne Software
Get fixtx911.zip from download page.
2. XLAT- Create custom character translation programs.
* * * * [added 3-21-98]
XLAT is an old but good alternative to FixText. One feature unique to XLAT is it's ability to clone itself into multiple programs, each containing a custom translation table. You won't need to keep track of separate translation tables. In addition, a memory resident version can perform "on the fly" translation when sending strings to printer. Fast. From the docs...
"It is often useful to have a little utility that translates certain characters within a file to certain others; e.g., if you have received an EBCDIC file, or if you have to deal with ISO-646 or ISO-8859 representations of national characters, which typically differ from the PC's....[I]t would be nice if it were easily customizable.... One solution to this would be to have the programme read a translation table at run time; but then, you have to remember about these tables, their format etc., ...and you have to remember to take them along when you move between different machines.
The Xlat package is a different: there is just one programme for each type of conversion. Additional flexibility is gained by providing two flavours: a filter flavour, which can be used for disk files and for inter-programme data exchange, and a resident flavour, which is specifically designed for serving a printer....For customizing...there is a companion programme, ConfXlat, meaning 'configure xlat'. You feed it one version of an Xlat file, and you'll have a full screen menu that allows you to change the mappings and create a new incarnation of an Xlat programme...Your versions may have names like 'EBC2ASC', or 'GERM-646', or whatever."
Included sample filters perform these translations: replace non-ASCII characters by near-equivalents; replace non-ASCII characters by blanks; convert EBCDIC to ASCII; convert ROT13 Usenet-style encryption; replace German umlauts (IBM style) by ISO-646 equivalents; and vice versa.
Author: Gisbert W. Selke (1990). Suggested by Robert Bull.
download xlat11.zip (59K)
3. CWATCH- TSR character translator with 4 built-in translation tables.
unrated [added 4-5-98]
CWATCH is a TSR program that translates characters "on the fly." CWATCH monitors the keyboard, screen and printer. Notable features include four built-in commonly-used translation tables. Supported languages are English (active by default), French and Spanish. You can also create your own external tables using a very simple table format.
CWATCH [options] -? or -h.........[h]elp -u...............[u]ninstall -t<tables>.......select tables to use (from 0 to 3) -n<d>............[N]o monitoring for this device -f <file>........Get translation table in this [f]ile -l <language>....select [l]anguage
Internal translation tables include:
0: All accentuated characters become non-accentuated. 1: All graphic box characters become text characters (+,-,|). 2: Simple border characters become double. 3: Double border characters become simple.
Author: Vincent Penquerc'h, France (1997); Suggested by M. Van Erp.
download charw110.zip (10K)
TEXT SORTING
Also see: 32-bit SORT included with TXTUTILS.
1. RPSORT- Sorts large files extremely fast.
* * * * * [added 3-21-98 updated 6-29-98]
A super-fast sort program which handles large files. "RPSORT supports numerous sort key types including regular text keys, C language strings, Turbo Pascal strings, signed and unsigned binary integers of any length and several types of binary floating point numbers."
From a reader:
"I tested many of the sort programs in the SimtelNet repository on text files. Most are limited somehow (like DOS sort), or choke, or take a long time to sort, or plainly produce a wrong output (missing or extra records, etc.). The final two survivors were msort and rpsort. I tested both on very long text files (tens of megabytes: the collated complete works of Shakespeare, Project Gutenberg). Msort took several tens of minutes, rpsort did the same in *seconds* (I thought it hadn't run at all.) Given that, there was nothing else to say about DOS sort programs, in my opinion."
Author: Bob Pirko (1992); Suggested by Joao Magalhaes.
download rpsrt102.zip (87K)
2. PCSORT- Full screen text sort program supports block, word, and multi-line sorting.
unrated
PCSORT (9K) runs as a full screen, interactive program by default but can also function in the role of command line filter. Although source file size is limited by available conventional memory, PCSORT offers an easy-to-use interface and can sort multi-line records (up to 9 lines) and blocks simultaneously. Results can be viewed before being written to disk. Author: Michael J. Mefford/ PC Magazine (1991). Suggested by Robert Bull.
Options: (edited for formatting)
/Sn n=size of record in lines (1-9)
/Pn n=sort priority (1-9)
/R Sort current priority in reverse order
/N Numeric sort current priority
/C Case sensitive sort
/L[n] Line sort:
n=record sort line (1-9)
/[B][+] nn [xx [y]] Block or column sort:
nn=start column
xx=width
y=sort line (1-9)
/W [+|-] n Word sort:
n=word count
minus = count from end of record
Screen menu commands: F1 Displays all sort fields; Alt-F1 Resets all the sort variables to their defaults; F2; Save file; F3 New file; F4 Sort text; F5 Increase lines per record (1-9); Shift F5 Decrease lines per record; F6 Select next key priority (1-9); Shift F6 Select previous key priority; F7 Sort order (de/ascending); F8 Alphanumeric or Numeric sort; F9 Select next Field type: Line, block, word or none; Shift F9 Select previous Field type; F10 Mark the record line for line sort or mark block sort field or select sort word count; Shift F10 Reverse selection of word count.
Update 3-2-98 : The v. 1.1 update of PCSORT was originally published in 1991 but apparently is not widely distributed on the Net. The pcsort11.zip archive contains the asm source code, the doc file and the com program for PCSORT as updated 4/18/91 to fix problem with form feed at end of data file. Also contains PCSORT article published in PC Mag: see the included *.xyw (XyWrite) docs.
download pcsort11.zip (40K)
[ Go to Top | Front Page ]
© 1994-2001. Rich Green