global menu:
Back to PAGE 1.
Advanced, broad function text processing programs like LM, SED, or AWK can perform most of the specialized tasks described in this section, but those listed below may be better suited to the casual user or may include special options not available in other tools. For HTML filters/ converters, see HTML UTILS page
DOS <-> UNIX TEXT CONVERTERS
unrated
1. RUM- Convert a file between UNIX and DOS text formats.
2. FLIP- Converts file(s) between UNIX and DOS text formats.
3. UX2DOS- Convert UNIX format text files to DOS (handles mixed formats).
If you're looking for a MAC text converter, see REMOVE.
DUPLICATE LINE FILTERS
(also see: UNIQ in TS Filters or Txtutils packages)
1. REMDUP- Remove duplicate lines from a sorted file.
2. UNIQ- Remove or display duplicate lines from sorted file.
1. REMDUP: unrated. Comments from a user: "How do I use this util? Occasionally I download my bookmark file, combine it with a previous one, sort the combined file, and run remdup. The result is a compact archive of sites. Then I ruthlessly prune my bookmark file. It gathers more bookmarks. Note that the file must be sorted. That is, duplicate lines must be next to each other to be found. This can work to one's advantage. One can sort only a section of a file. Duplicate lines would be removed from that section only. Case sensitive and case insensitive versions are included. There may be a maximum line length restriction, but it handles 451 characters just fine." Author: Joao Magalhaes. Portugal. Suggestion and description by M. Van Erp.
Usage: RMDUPS < sortedfile [ >outputfile] SORT < infile | RMDUPS [ > outputfile]
download rmdup0.zip (11K) link fixed 11-23-98
2. UNIQ: unrated. [added 4-18-98] Similar to REMDUP0 but offers more options. Besides removing adjacent duplicate lines from output, UNIQ offers a "reverse" option: display a single representative of just the duplicate lines. The -c option outputs a count of how many copies exist of a duplicate line. In addition, one can designate which fields on lines to search (a field being text separated by tabs or spaces). UNIQ is case-sensitive only. Can use with either filter or input-output filename syntax. Author: Jason Mathews (1995).
Usage: uniq [ -cdu ] [ +|-n ] [ inputfile [ outputfile ] ] -c Precede each line with a count of the number of times it occurred -d Write one copy of duplicate lines -u Copy only lines not repeated in the original file +n Skips over the first n characters -n Skips over the first n fields
download uniq12.zip (10K)
TEXT JUSTIFY
1. Just- ASCII text justifying filter.
* * *
This 24K utility can justify paragraphs (introduces spaces to remove ragged margins). Left, right, and center justification supported. It can also automatically draw boxes around justified paragraphs.
just [options] [infile] [ >outfile]
-w: Specify the desired output page width, in characters.
-m: Specify the line length below which justification should not be attempted.
-l: Specify left justification mode.
-c: Specify centre justification mode.
-r: Specify right justification mode.
-p: Specify padding justification mode.
-xC:Use character C to make a box-surround for justified paragraphs.
2. Justify- Flexible text justifying filter.
unrated [added 11-10-98]
From the docs "Justify will reformat already formatted text. It will ignore titles and other header information and reformat paragraphs to any desired style...The input text must be stripped of all tab characters. JUSTIFY must be able to deturmine what constitutes a paragraph. It is important that the input text be consistently formatted."
Also useful for formatting e-mail (see 'e' and 'q' options). Author: Tom Almy (1997). Suggested by Robert Bull.
justify columns [bflditohsrweq] [indent] [body] <source >dest b - input file paragraph is hanging indented f - input file paragraph is fully indented l - input file paragraphs are single lines d - delete blank line after paragraph read i - insert blank line after paragraph read t - indent first paragraph line by indent spaces o - indent other paragraph lines by body spaces h - remove hyphens across line boundaries s - double space after . ? ! ." ?" or !" m - process m-dash adjacent to words w - output for word processors r - ragged right margin (otherwise full justification) e - EMAIL input -- don't format quotes or headers q - EMAIL output -- add '>' to non-blank lines
The mandatory "columns" argument is the number of columns of text to output. The [indent] and [body] arguments are associated with the 't' and 'o' options.
download justfy15.zip (18K)
OTHER
REMOVE- Remove carriage returns in ASCII documents.
* * *
Many documents on the Web are ASCII text and lines are often broken by carriage returns at 60-80 columns. If you import this text into many word processors the carriage returns are retained and interpreted as paragraph marks. This usually fouls your attempts to apply special paragraph styles in your word processor because each line is now considered a paragraph. To get rid of these paragraph marks you need to run the text file through a filter prior to importing into your word processor. Some word processors may include such a filter, but mine (WinWord 2.0) doesn't.
The REMOVE package contains two executables - for DOS and Windows (3.1). It strips single carriage returns/line feeds while preserving actual paragraph boundaries. It can also optionally preserve tab formatting. Although you can use nearly any search/replace tool to remove CR's and LF's, REMOVE is a friendlier alternative. [Bug: Seems to merge first two words of paragraphs which start with word "I" ("I am" becomes "Iam")]. This package also contains Convert which can detect and convert among DOS, MAC, and UNIX text formats.
download remove30.zip. (159K) link adjusted 4-17-98
Xray- extract plain text present in binary files.
* * *
Xray extracts plain text from binary files. One use: can show text contained in executables, dll's, etc.. It can also be used as crude means of getting plain text from any word processor file, although formatting is lost in the process.
Also see the similar program ReadText: ftp://gatekeeper.dec.com/pub/micro/pc/simtelnet/msdos/fileutil/rt101.zip
[ Go to Top | Front Page ]
© 1994-2001. Rich Green