DGREP - nopea egrep klooni

Dgrep on Unixin egrepin kopio. Ohjelma toimii ainakin PC:ss{, OS/2:ssa
ja Bsd-Unixissa. Lis{ksi se on varsin helppo siirt{{ muihin
ymp{rist|ihin joissa on ANSI C -k{{nt{j{. Dgrepin pit{si olla varsin
yhteensopiva egrepin kanssa lukuunottamatta muutamia optiota.

Dgrep on varsin nopea. Nopeus perustuu siihen, ett{ se k{ytt{{
etsitt{v{n merkkijonon mukaan parasta mahdollista algoritmia. Jos
etsitt{v{ss{ merkkijonossa ei ole s{{nn|llisi{ lausekkeita, etsimiseen
k{ytet{{n Boyer-Moore-algoritmia. Muussa tapauksessa dgrep rakentaa
deterministisen tila-automaatin merkkijonon etsint{{n. Hakujen
nopeuttamiseksi s{{nn|llisest{ lausekkeesta pyrit{{n etsim{{n
merkkijono, joka on pakko l|yty{. T{t{ merkkijonoa etsit{{n ensin
Boyer-Moore-algoritmilla. Tila-automaatti k{ytt{{ ns. laiskaa
evaluointia eli tarvittavat tilasiirtym{t lasketaan vain jos niit{
tarvitaan. Molemmat algoritmit ovat lineaarisia.

Seuraavassa on vertailtu dgrepin ja systeemin egrepin nopeutta
Bsd-Unixissa. Taulukossa on user- ja sys-ajat muutamalle testille, kun
l{hdetiedostona on ollut /usr/dict/words:

		dgrep	system egrep
'u.*nix'	0.3u	2.4u
		0.4s	0.3s
'first'		0.2u	2.5u
		0.2s	0.2s
'first|second'	1.8u	2.2u
		0.3s	0.2s

GNUegrepin ja dgrepin nopeusvertailu PC:ss{. Testiaineistona on
k{ytetty dgrepin sorsia kahteen kertaan luettuna, 32 tiedostoa ja 7544
rivi{. GNUegrepill{ k{ytettiin aina optiota -E (== use Egrep syntax).

			GNU egrep	dgrep
    Int			11.8		 8.1
-ic Int			13.4		 8.4
    first|second	28.6		16.9
-c  first|second	19.8		10.8	-- v{hemm{n tulostusta
    Unsigned		10.6		 7.9

Kirjoittamalla pelkk{ dgrep saa lyhyemm{n aputekstin ja kirjoittamalla
dgrep -h saa seuraavan aputekstin:

Usage: dgrep [options] {-f expression file | [-e] expression} [file...]
Options: -An  n lines after the matching line are printed
         -Bn  n lines before the matching line are printed
         -b   filename is displayed only once before matches
         -d   only dfa is used for searching
         -c   only a count of matching lines is printed
         -i   case insensitive match
         -l   only names of files with matching lines are printed
         -n   each line is preceded by its relative line number in file
         -s   silent mode, nothing is printed except error messages
         -t   all files that contain matches are touched
         -v   all lines but those matching are printed
         -x   exact, all characters in expression are taken literally
         -1-9 1-9 lines before and after the matching line are printed
         -e expression, useful when expression begins with -
         -f file that contains expression
Regular expressions:                    .       any single character
*       zero or more repeats            (...)   grouping
+       one or more repeats             ^       beginning of line
?       zero or one repeat              $       end of line
[...]   any character in set            \c      quote special character c
[^...]  any character not in set        |       alternative ("or")

--
muutos versiosta 1.62 versioon 1.71:
T{ss{ versiossa on mahdollista tehd{ nopeampi tila-automaatti
k{ytt{m{ll{ enemm{n muistia.

--

Dgrepin on tehnyt:

Jarmo Ruuth
f30932a@puukko.hut.fi
jruuth@otax.tky.hut.fi