We can distinguish two kinds of efficiency:
In the following section, we will discuss, in the context of the Fudgets system, how we obtain reasonable execution efficiency. We have not tried to quantify programmer efficiency in a way that permits further comparison or judgement.
Two tiny programs:
The Fudgets counter example from Section 9.4: 13s The Motif counter example from Figure 112: 14s Some small programs:
The Fudget calculator from Section 9.8 with 15 buttons: 13s The Fudgets calculator Cla with 28 buttons: 16s The calculator xcalc with 40 buttons, from X Windows distribution: 18s Some larger programs:
The Fudgets WWW browser from Chapter 32: 22s Mosaic 2.6: 75s Netscape 3.0: 137s Netscape 4.04j2: 313s Figure 101. Comparing the startup times of some programs when running via a 16.8kbps modem connection.
When it comes to event processing, we naturally wanted to to minimise the number of events that has to be handled by the functional program. Fortunately, the X Windows system can do a lot of event processing for the application. By setting event masks and button grabs appropriately, you can often eliminate all insignificant events, i.e., all events that are sent to the application program carry some meaningful information. In simpler window systems, the application has to deal with every little mouse move and button/key presses/releases by itself.
XGrabButton()
in
the Xlib manual) with an event mask that selects button
presses/releases and enter/leave window events (each GUI
element is a window), can be used to select exactly these events,
with only one small exception: if the mouse pointer enters the
button area, the mouse button is pressed, the pointer leaves
the button area and the mouse button is released, the program
will receive an insignificant button release event. The
important thing is that no unnecessary motion events will be
received.We also tried a third representation,
which was slightly less efficient than the continuation-based representation.data SP i o = StepSP [o] (i->SP i o)
In the discussion below, we assume the continuation-based representation (although some of the ideas can be carried over to other representations).
loopThroughRight
(Section 18.2) is a general way to
adapt an existing stream processor for use in a new
context. Another simple and common way to adapt a stream
processor is by mapping a function on the elements of the
input or output stream:
For example, tagged parallel composition of fudgets, >+<, can be defined likesp -==- mapSP g mapSP f -==- sp
wherefud1 >+< fud2 = mapSP post -==- (fud1 -+- fud2) -==- mapSP pre
pre
and post
are the appropriate re-tagging
functions. However, if implemented directly, such a
definition has a rather high overhead.
By transforming >+<
to a form not involving -==-
,
-+-
and mapSP
, but instead recursion and
pattern matching on the stream-processor constructors, a
more efficient solution can be obtained.
Programs transformations of this kind are tedious to do by
hand, but it could still be worthwhile if the resulting code
is to be included in a library. The above described
transformation has been done by hand in the Fudget library. We
measured the effect on a communication intensive fudget
program containing a parallel composition of 50 fudgets (the
Space Invaders program described in
Chapter 35). The transformation reduced the CPU
time consumption by over 35%. Encouraged by this result, we
also transformed some more combinators (for example
>^=<
and >=^<
discussed in
Section 13.1) in the same way.
It would of course be nicer to have these transformations
done automatically, especially when they are needed in
application programs. The kind of automatic transformation
that would be useful here is deforestation
[Wad90], which eliminates
intermediate data structures (applications of PutSP
and GetSP
in this case) by using certain unfold/fold
transformations.
The manual work required in this solution is located in the library. The application programmer need not be aware of it. The automatic work required is inlining (unfold) by the compiler. It actually works even without inlining, but the efficiency gain is not as big.
The expressions we wish to optimise are of the kind
illustrated above: a stream-processor combinator applied to
mapSP f
, for some f
.
The trick is to make mapSP
a constructor
in the stream-processor data type:
Since the type is abstract, adding constructors to it like this will not be visible to application programmers.data SP a b = PutSP b (SP a b) | GetSP (a -> SP a b) | MapSP (a -> b) | NullSP
Now that MapSP
is a constructor, the implementation of
serial composition (as shown in Figure 45) can be
extended to handle the case when one or both arguments are
applications of MapSP
in a more efficient way. This
means that an expression like
can evaluate toMapSP f -==- MapSP g
With inlining, this step can be taken by the compiler and the compositionMapSP (f . g)
f . g
can then be optimised
further. Without inlining, we have at least eliminated a use
of -==-
, and thereby reduced the number of
generated applications of the PutSP
and GetSP
constructors.
We have not tested the above ideas in the Fudget library.
The first test measures the efficiency of serial composition
and compares the operators >==<
, >^^=<
and -==-
. We measured the time it took to send
around 5000 messages through a serial composition of a
varying number of identity fudgets (or identity stream
processors). The program used is shown in
Figure 102. It was compiled with HBC and run on a
Pentium Pro 200Mhz under NetBSD 1.2 [Neta]. The
results are shown in Figure 103. We can
see that the time grows roughly linearly with the length of
the composition and that serial composition of fudgets is
much more expensive than serial composition of stream
processors.
The last table in Figure 103 shows the
performance of the function composition map id .
... . map id
. It is more efficient than the other
serial compositions.
import Fudgets main = fudlogue mainF mainF = nullF >==< tstF >==< concatSP >^^=< stdinF tstF = case argReadKey "comb" 1 of 1 -> nest (idF>==<) idF depth 2 -> nest (idSP>^^=<) idF depth 3 -> absF (nest (idSP -==-) idSP depth) nest f z 0 = z nest f z n = f (nest f z (n-1)) depth = argReadKey "depth" 0Figure 102. A program to measure the efficiency of serial composition.
time ./Internal <testinput1 -S -h8M - -depth n
-comb 1
(idF >==< idF >==< ... >==< idF
):
n User time GCs GC time Max heap Max stack 0 0.080u 0 0.00 50 4.173u 40 0.10 62432 836 100 8.475u 80 0.25 95840 1410 200 18.418u 163 0.95 159884 3042 400 44.521u 334 3.35 288020 3768
-comb 2
(idSP >^^=< idSP >^^=< ... >^^=< idF
):
n User time GCs GC time Max heap Max stack 0 0.100u 0 0.00 50 0.755u 8 0.01 35712 210 100 1.452u 15 0.03 42270 272 200 2.869u 30 0.08 53504 642 400 5.706u 59 0.18 76976 1218
-comb 3
(idSP -==- idSP -==- ... -==- idSP
):
n User time GCs GC time Max heap Max stack 0 0.076u 0 0.00 50 0.411u 4 0.01 32524 171 100 0.812u 7 0.00 35840 348 200 1.574u 15 0.04 41956 651 400 3.047u 30 0.07 54924 1260 map id . map id . ... . map id
n User time GCs GC time Max heap Max stack 0 0.000u 0 0.00 50 0.094u 2 0.00 3956 98 100 0.203u 4 0.00 5616 275 200 0.434u 8 0.00 9712 566 400 0.906u 16 0.01 17616 473 Figure 103. The efficiency of serial composition.
The second test measures the efficiency of parallel
composition. We measured the time it took to send about 70000
messages through one of the fudgets in a parallel
composition of identity fudgets. The program used is shown in
Figure 104. The parallel compositions were created
by listF
,
which internally constructs a balanced binary tree of parallel compositions and a table for translating the addresses of the fudgets to positions in the tree. The results are shown in Figure 105. The depth of the tree, and hence the time it takes to send a message to a particular fudget in tree grows logarithmically with the size of the parallel composition. However, since all that is known about the address type is that it is an instance of thelistF :: (Eq a) => [(a, F b c)] -> F (a, b) (a, c)
Eq
class, the table lookup has to be implemented as a linear
search and hence the lookup time varies linearly with the
position in the list. From the results we see that the time of
the table lookup soon becomes the dominating factor.
This suggests that it would be a good idea to
provide alternative combinators to listF
, which require
the address type to be an instance of the Ord
class, or
even the Ix
class, to reduce the time complexity of the
table lookup time to logarithmic or constant, respectively.
import Fudgets main = fudlogue mainF mainF = nullF >==< tstF >==< concatSP >^^=< stdinF tstF = listF [(i,idF) | i<-[1..size]] >=^< (,) sel size = argReadKey "size" 1 sel = argReadKey "sel" 1Figure 104. A program to measure the efficiency of parallel composition.
time ./Internal2 <testinput2 - -size n -sel k
k=1
n 2log n time GCs GC time max heap 1 0 1.352u 11 0.02 29908 32 5 2.179u 18 0.03 34924 256 8 2.457u 22 0.07 66744 2048 11 2.989u 27 0.31 317776 n=256
k 2log n time GCs GC time max heap 64 8 4.640u 33 0.11 127 8 7.001u 44 0.12 128 8 6.801u 44 0.16 256 8 11.126u 67 0.24 87880 Figure 105. The efficiency of parallel composition of fudgets.