global menu:
TEXT FILE COMPARE/ DIFFERENCE PROGRAMS
Text file compare programs are frequently used by programmers for version maintenance- but they can also be used by us common folk to compare two versions of an ascii document (e.g., different versions of those autoexec.* files that accumulate in your root directory, file lists, etc.). The programs below may use different "difference" algorithms- each with unique strengths/ limitations. Not usually a major issue for simple uses, but you may wish to try them all to determine which suits your needs and style best. The programs listed below may not be the best picks for programming needs.
12-15-98: Need to add: Double Lister, dual window text comparer (Steven S. Bates, 1989, dl103.zip). Sugg. by Robert Bull.
1. Visual Compare- Feature-rich file comparer.
* * * * *
A favorite for general use because of the flexible display options. Interactive and command line modes; possesses an internal viewer with scrolling capability; by default it colorizes new/old/changed text which makes for easy comprehension of differences. Split window (horiz. or vertical), dual file display option. Flexible output options. Understands UNIX formatted text.
Limitations:
Command line usage: VCOMP fileone filetwo [options] Options: /B....Monochrome display. /Tn...Tab width. Range is 2-64. Default is 8. /25...Display 25 lines if you have either an EGA or a VGA. /43...Display 43 lines if you have an EGA. /50...Display 50 lines if you have a VGA. /S[-].Write edit script to standard output. /C....Write composite file to standard output. /D....Write difference file to standard output. /En...Maximum edit distance. Range is 0-32736. Default is 32736. /I....Ignore leading space and tab characters. /K....Consider upper-case and lower-case letters equivalent. /Z....Consider all characters significant.
Author: John R. Whitney (1993)
download vc154.zip (38.4K)
2. @COMPARE- Text file comparer for very large files.
* * * [updated 05-25-99]
(aka "ATCOMPARE", "ACOMPARE."). Comprehensible ouput to screen is color coded- but you can't scroll back through output as in VCOMP. Easily digestible report-to-text file output with side-by-side comparisons (unfortunately broken word fragments can result from the program's wrapping of text when generating side-by-side comparisons).
Limitations:
Usage: @Compare [options] [<filename1> [<filename2>]] where [options] begins with / or -, and is a combination of the following: P -directs output to the printer F -directs output to a file M -suppresses colors for monochrome monitors T -suppresses the title header H -suppresses highlighting in unequal lines A -replaces graphics characters with standard Ascii codes R -prints a report of discrepancies by field to a file C -disables breaks after every screenful of output L -allows for long and E -extra long searches; not usually necessary B -suppress direct video writes; use BIOS instead Q -quits
Author: Brian C. Madsen (1994-98); Suggested by Marianna Van Erp.
05-25-99: New in v 1.8 (5-99): bug fix for fast Pentiums.
download atcomp18.zip (23K)
3. jDif- Fast file difference utility.
unrated
Don't have much experience with this one. Fast, color-coded output to screen, or send results to report file.
Syntax: jdif oldfile newfile [options] /a...370 Assembler (columns 1 to 72) /c...COBOL (columns 7 to 72) /f...DOS FC style output /h...Help (this is it) /r...output Report /v...do not buffer output Limitations:
"Private persons are hereby licensed to use the software at home for non-commercial purposes at no charge." Jonathan Rosenne. Israel. (1996) Suggested by M. Van Erp.
download jdif01.zip (36.5K)
4. diff (Diffutils)- Unix file difference package.
unrated [added 8-16-98 updated 01-26-00]
This 32-bit package includes 4 programs: diff, cmp, diff3, and sdiff. Diff seems best suited to tracking differences among versions of code/ documents through time (i.e., version maintenance) rather than offering quick, intuitive recognition of differences as with VCOMP. But as noted from a reader: "[diff] has long had a very readable format, that does not require ansi color etc. called the context format:
diff -C 1 file1 file2
this prints different lines (changed, deleted, or added) with one line that is the same in each file at the beginning and end, something like this:
first line is same 2C -- second line of file is different --------------------------------------- 2C -- second line of this file is different third line is same
Where the first and third lines are the identical ``context'' lines. Very readable format for changed lines, a little less for lines added or deleted." Also, for hard copy, double column, portrait-mode visual comparison of content (i.e., ignoring layout) you may want to re-wrap copies of your two text files to a narrow width (e.g., 35, if that doesn't mangle the readability too much) and then play with the --width=NUM parameter (e.g., =80) in combination with the -y switch. Diff is capable of handling large files. Requires a DPMI-provider under plain DOS (cwsdpmi.exe), and a 386+ PC. Win9x LFN compatible under Win9x. (2000)
From the docs:
`diff' outputs differences between files line by line in any of several formats, selectable by command line options. This set of differences is often called a "diff" or "patch". For files that are identical, `diff' normally produces no output; for binary (non-text) files, `diff' normally reports only that they are different. You can use the set of differences produced by `diff' to distribute updates to text files (such as program source code) to other people....The `cmp' command shows the offsets and line numbers where two files differ. `cmp' can also show all the characters that differ between the two files, side by side. ...Use the `diff3' command to show differences among three files...`diff3' can report the differences between the original and the two changed versions, and can produce a merged file that contains both persons' changes together with warnings about conflicts...Use the `sdiff' command to merge two files interactively.
diff "...provides ways to suppress certain kinds of differences that are not important to you. Most commonly, such differences are changes in the amount of white space between words or lines...differences in alphabetic case or in lines that match a regular expression that you provide. "
01-26-00: diffutils v2.7.2 (01-00, stable beta) released. "a couple of new options for `diff' and a new interactive command for `sdiff'. See the file NEWS for the full list."
download dif272b.zip (294K)
Older (1993) 16-bit diff binaries (no sdiff):
ftp://gatekeeper.dec.com/pub/micro/pc/simtelnet/gnu/gnuish/dos_only/diff23x.zip (139K)
5. FINTRSCT - Compare 2 files; outputs shared/ unique lines to 3 report files.
unrated [added 7-5-98]
File Intersection takes a slightly different approach to the task of file comparing. FINTRSCT compares two (smaller) files and outputs three files: one file listing lines unique to file 1; a second file containing lines unique to file 2; and a third file containing paired shared lines. Lines are numbered to allow easy location in original files. The order of lines in the input files is not relevant, and comparisons are case insensitive. Useful for comparing different versions win.ini, autoexec.bat, etc..also useful for comparing updated file lists (e.g., easily determine "what's new"). 16 and 32-bit versions included. Author: Paul Trout (1996). Suggested by Marianna Van Erp.
Remarks: This tool acts similar to line uniqifiers- but unlike the latter doesn't require the manual merge of the two text files and post-merge sorting . The 16-bit version handles smaller files. I tested two 200K files (about 20,000 one-word lines each) and the program locked my machine. I then tested two 100K files (about 10,000 one-word lines- with one shared line) and it worked, but took about two minutes to process on a P-60. 32-bit version untested.
USAGE: fintrsct file1 file2
Creates : unique1 - lines unique to file1 unique2 - lines unique to file2 common - lines common to file1 and file2
download fintrsct.zip (25K) link adjusted 05-01-99
MISC. TEXT UTILS
HTML converters are listed on the HTML utils pages.
For a good beginner's intro to the desktop publishing package TeX, see Scott Nesbitt's article TeX: The DTP Alternative.
catdoc- Convert MS Word files to plain text (or other user-customized output formats).
unrated [added 08-06-99 updated 08-13-99]
From the docs: "catdoc reads MS-Word file and produces human-readable text on standard output. Optionally it can use latex escape sequences for characters which have special meaning for LaTeX. It also makes some effort to recognize MS-Word tables, although it never tries to write correct headers for LaTeX tabular environment. Additional output formats, such [a]s HTML can be easily defined....catdoc comes with two [output] format [options] - ascii and tex but nothing prevents you from writing your own format (set two map files - special character map and replacement map).Catdoc doesn't attempt to extract formatting information other than tables from MS-Word document, so different output modes means mainly that different charac[t]ers should be escaped and different ways used to represent characters, missing from output charset...Catdoc uses internal unicode representation of text, so it is able to convert texts when charset in source document doesn't match charset on target system."
Known limitations: "Can produce garbage, if file contains embedded illustrations. Doesn't handle "fast-saved" Word docs properly. Prints footnotes as separate paragraphs at the end of file, instead of producing correct latex commands. Cannot distinguish between empty table cell and end of table row."
Notes: unzip with "create directories" option. GPL, source included. Source-only distribution is also available (Unix and DOS); An Excel spreadsheet to CSV text converter and an rtf to HTML converter are also in development and should be included in future releases.
Author: V.B.Wagner, Russia (1999). Home Page.
download catdoc-0.90.3.zip (183K)
HelpDC- Convert Win 3.x/95 HLP files to RTF format.
* * * *
This DOS program is a Windows 3.x/95 *.HLP file decompiler. Useful to the non-programmer because it has an option (/r) to convert HLP files to RTF format (which can be further converted to plain text [by word processors] or to HTML [e.g., see Martha ]). A large program (237K), but fast. Documentation bilingual (German/English); may be difficult to follow. Author: M. Winterhoff. Germany. (1996)
download helpdc21.zip (222K)
PSX- Convert postscript documents to plain text.
*
PSX is a small (16K) and simple command line postscript document-to-text converter that I found somewhere on a BBS. It does a very inconsistent job of translation (sometimes good, sometimes very poor)- but if you just want to browse the contents of a postscript text file you downloaded off the Net, this program may suffice as a disk-saving alternative to Ghostscript. The main eye-sores resulting from conversion are loss of paragraph formatting and some split words. PSX is donationware. I suspect you won't find the latest version (psx102e) anywhere on the Net except here.
syntax: PSX [PostScriptfile] [textfile] [/option]
Both the input (PostScript) and output (text) file names may be
optionally entered at the command line. If no text file name is
specified PSX creates an ASCII file using the same name as the
PostScript file, but with ".TXT" as the DOS filename extension.
If no PostScript filename is specified, PSX will ask for one.
options: /HELP (or /?) displays this text
/WIDTH=n (n is a number between 40 and 132 controlling output)
download psx102e.zip (15K)
Text2PDF- Convert text files to PDF.
unrated [added 12-8-98]
Text2PDF is a small (20K), versatile utility that converts plain ASCII files to Adobe PDF. Text2pdf makes a 7-bit clean PDF file (version 1.1) from any input file. It reads from standard input or a named file, and writes the PDF file to standard output. Limitations: You cannot produce hypertext links- either to bookmarks, within the file, or to external content. You cannot add styles to headings or body elements, nor does the program reformat bullets and numbered lists. Text is formatted as is. You will probably have to tweak your text files to ensure that the word wrapping is correct.
text2pdf [options] [filename]
-h show this message
-f<font> use PostScript <font> (must be in standard 14, default:
Courier)
-I use ISOLatin1Encoding
-s<size> use font at given pointsize (default 10)
-v<dist> use given line spacing (default 12 points)
-l<lines> lines per page (default 60, determined automatically
if unspecified)
-c<chars> maximum characters per line (default 80)
-t<spaces> spaces per tab character (default 8)
-F ignore formfeed characters (^L)
-A4 use A4 paper (default Letter)
-A3 use A3 paper (default Letter)
-x<width> independent paper width in points
-y<height> independent paper height in points
-2 format in 2 columns
-L landscape mode
Author: Phil Smith (v1.1, 1996) Home page. Suggestion/ notes by Scott Nesbitt.
download text2pdf.zip (12K)
Xpdf- Toolkit for extracting text/information/images from Adobe PDF files.
unrated [added 02-09-00 updated 01-20-01]
A suite of command line tools for extracting data from Adobe PDF files. Most notably, there's a pdf to text converter included. From the docs:
Remarks: (DJGPP) Requires 386+ PC with DPMI provider, and a math coprocessor. File names and zip archive directory names do not all conform to DOS 8+3 conventions. Special requirements for pdftotxt: gzip in path. These programs are quite large (~1MB each; compressable to ~190K with UPX) and may not be well-suited to low resource hardware. Here's a quick test I ran on PDFTOTEXT to measure performance for different PDF files sizes on different hardware (no embedded images):
Time needed to complete processing (v0.90):
HARDWARE................ 386sx/20 w/387 ............PII/400
8MB RAM 96MB
OS.....................MSDOS 6.22/SMARTDRV.......Win98 DOS box
Time Time
PDF file size = 700K...........10m....................10.2s
PDF file size = 47K...........35s.....................0.6s
(m= minutes, s=seconds)
Author: Derek B. Noonburg, xpdf Home Page; DOS port by Michael Richmond. Added on tip by Bob Williams (Surv-PC forum). Also available for Win32, Linux, OS/2, and more. Source available, GPL.
01-20-01: v0.92 (12-00) available.
download xpdf-0.92-dos6-djgpp.zip (1.2MB)
Acrobat Reader for DOS.
* * *
Why use an old DOS version of Acrobat? Good question. You probably shouldn't. It seems to do fine with simple pdf files such as tax forms but it does not support many of the latest PDF enhancements introduced over the past couple years. Hint: The "bitmap" printing option is useful if you lack the fonts required by a given pdf document.
Special requirements: Needs a 386+ PC, 4MB RAM (?), VGA, and may require some disk "acrobatics" if you're short on disk space (the 2.5MB zip below contains a 2.5MB self-extracting EXE which must be run to unpack the install files [2.5MB total], and then you must run the install [installed files require 3.6MB]. Author: Adobe Systems (1993)
download Acrodos.zip (2.6 MB)
[ Go to Top | Front Page ]
© 1994-2001. Rich Green