Date: Mon, 11 May 1998 22:27:43 +1000 (EST)
Subject: categories: the FISh language
From: Barry Jay <cbj@socs.uts.edu.au>

 [apologies for multiple announcements]


	***********************************
	* Functional = Imperative + Shape *
	***********************************

FISh is a new array programming language that combines (and extends)
the
	       EXPRESSIVE POWER 

of functional programming with the 

	      EFFICIENT EXECUTION 

of imperative, or procedural, programming by performing

	     STATIC SHAPE ANALYSIS

on all programs. This shape computation reduces higher-order
functional programs to simple imperative forms, i.e. F - Sh = I.
Conversely, FISh works best when functions are constructed from
imperative procedures and shape functions, as recommended by the
slogan that gives the language its name.

	         F = I + Sh

FISh execution speeds on typical array problems are SEVERAL TIMES
FASTER than other higher-order, polymorphic languages.

This announcement, research papers, reference material and the
implementation are all available from

     http://linus.socs.uts.edu.au/~cbj/FISh/announcement.html

The main themes are introduced in the paragraphs below.


Barry Jay
http://linus.socs.uts.edu.au/~cbj


	***********************************


Expressive Power 
~~~~~~~~~~~~~~~~

FISh supports 

- the usual functional features

such as strong typing, higher-order functions and parametric
polymorphism. It also supports 

 - simple imperative features 

such as assignment, for- and while-loops, and local variables, using a
type of commands. Procedures are represented as functions into
commands, as in Algol-like languages. 

Unlike existing Algol-like languages, it also supports data types of
arrays, so that array programs may be polymorphic in the size of the
array. Now this has been extended to
support

 - poly-dimensional array programming.

If a is any array type then [a] is the type of all arrays with entries
in a that are

 - finite dimensional, and 
 - regular, 

such as vectors, matrices, three-dimensional arrays, etc. Thus,
poly-dimensional array programs can be written that act on a vector,
matrix, or higher-dimensional array. For example, the standard prelude
supports a polymorphic constant

              map : (a -> b) -> [a] -> [b] 

which will map a function over a vector, matrix, or higher-dimensional
array. Note that a matrix of integers, or type [int] has a different
type from a vector of vectors of type [[int]]. Thus, given 
sum_int : [int] -> int which adds up an array of integers, we have

             map sum_int : [[int]] -> [int]

which will sum each entry of the outer array. This is useful when 

 - representing complex numbers as arrays of length 2, or
 - each entry in an array has an associated array of data, e.g.
      - each entry is given an array of nearest neighbours, or
 - decomposing an array into an array of blocks, e.g. 
      - a matrix into a matrix of matrices.


Efficiency
~~~~~~~~~~


-------------------------------------------------------------
|               |Map   Reduce  Q'sort  FFT   MM-loops  MM-ip |
|_______________|____________________________________________|
|---------------|--------------------------------------------|
|FISh           |1.04  0.37     1.93   3.57  4.36       7.05 |
|---------------|--------------------------------------------|
|Ocaml(in-lined)|4.00  2.66     3.53                         |
|---------------|--------------------------------------------|
|Ocaml          |5.99  4.59    15.47   8.16  7.71      60.61 |
|---------------|--------------------------------------------|
|speedup        |5.8   12.4     8.0    2.3   1.8        8.6  |
--------------------------------------------------------------

		Benchmark user times 

See http://linus.socs.uts.edu.au/~cbj/FISh/Benchmarks for more
details.


Static Shape Analysis
~~~~~~~~~~~~~~~~~~~~~

Shape analysis is used to determine the number of dimensions, and the
size in each dimension, of every array appearing in a program. (A
program is a closed term of array or command type.) In particular, it
checks that every array is regular, i.e. is a hyper-cube whose
entries all have the same shape. The latter requirement is necessary
to ensure that every entry in a vector of vectors has the same length,
i.e. that a vector of vectors is a matrix. As well as detecting
irregularities, it is able to detect all other shape errors, such as
multiplying matrices of innapropriate sizes, or having incompatible
numbers of dimensions. It is the power of shape analysis that makes
poly-dimensional programming feasible.

Knowledge of the shapes can be used to improve memory management
during compilation of the resulting imperative program. In particular,
FISh supports a clear stack discipline, and so does not require a
garbage collector. The existing FISh compiler translates programs in
to simple C code, which is then compiled and executed in the usual
way. This approach is also being applied in the design and
implementation of a parallel version of FISh, called GoldFISh.


F = I + Sh
~~~~~~~~~~

The slogan is represented within the standard prelude by a function

     proc2fun : (var a -> var b -> comm) -> (#b -> #a) -> b -> a 

for converting procedures (and shapes) to functions. proc2fun pr sh x
uses the shape function sh and the shape #x of the array x to
determine the shape of the result. It then creates a local variable y
of this shape, and invokes the procedure pr on and a stored value of
x, finally returning the value of y.


Semantics
~~~~~~~~~

Shape analysis is supported by a clean and powerful categorical
semantics, in which the shape-data (or shape-entry) decomposition is
represented by a pullback. For multi-dimensional arrays, this is

                                    entries	 
                      	       [a] ---------> List a
                                |   |           |
                                |___|           |
                          shape |               | map #
                                |               |
                               \/               \/ 
                           List N x #a -----> List #a 
                                         c

where c takes the list of numbers ns and the a-shape sh and produces a
list of length given by the product of ns whose entries are all sh.
In other words, all entries of an array must have the same shape.


Poly-dimensionality 
~~~~~~~~~~~~~~~~~~~

FISh supports the usual Hindley-Milner style of polymorphism. It also
supports polymorphism in array sizes (unlike earlier Algol-like
languages) and in the number of dimensions of an array. That is, FISh
supports polydimensional programming. For example, the map function
is defined to have type

                             map : (a -> b) -> [a] -> [b]


and so may act on a vector, matrix or higher-dimensional
array. However, the existence of the simple imperative features means
that this can be compiled into a sequence of nested for-loops
corresponding to the number of dimensions of the array argument. 
Polydimensionality is a form of *shape polymorphism*. 


Array Programming
~~~~~~~~~~~~~~~~~

Poly-dimensionality means that many array programs, e.g. numerical
recipes, can be written for any number of dimensions while maintaining
static type and shape checking. For example, FISh supports
poly-dimensional stencilling and a poly-dimensional difference
equation solver (see Sample Programs).


Parallel Programming 
~~~~~~~~~~~~~~~~~~~~

Size and shape are important parameters in estimating the cost of
alternative parallel implementations. FISh has been designed to
support a parallel variant, called GoldFISh, currently under
development. GoldFISh will support additional parallel combinators for
some of the existing FISh functions that express the usual
second-order constants and also the usual array distributions (see
Sample Programs).


	***********************************


From: "Frode Odegard" <frode@mailserv.mta.ca>
Date: Mon, 11 May 1998 12:26:46 -0700
Subject: categories: Re: the FISh language

The right URL appears to be:

http://www-staff.socs.uts.EDU.AU:8080/~cbj/FISh/Announcement/

Best regards,


Frode Odegard

-- 
..........................................................
Frode Odegard - frode@odegard.com - http://www.odegard.com


Date: Wed, 28 Oct 1998 19:22:05 +1100 (EST)
Subject: categories: Update on FISh and shape
From: Barry Jay <cbj@socs.uts.edu.au>

The FISh homepage 

    http://www-staff.socs.uts.EDU.AU:8080/~cbj/FISh/

has a lot of new material. Here are a couple of pages. 

Barry Jay


	  What's New!
	  =========== 

Polymorphic quicksort in FISh is faster than in C for quicksort (see below).

Download FISh-1.5. Check the list of new features. 
                             
The formal language definition for FISh is based on the original design. 
Versions since 1.4 have been checked for compliance. 

Source code for FISh is also available with the distribution. 
Now you can compile your own executables, 
or try out your own language design ideas. 
                             
Poly-dimensional array programming (revised 4/8/98) 
                             
Costing Parallel Programs as a Function of Shapes (revised 29/9/98) 
                             
A simple example of polydimensionality in solving difference equations. 
                             
A new function "selfmap" in standard_prelude.fsh gives more efficient 
mapping algorithms for idempotent functions. 
                             
Mail list for FISh users - sign up now! 


	  Sometimes FISh faster than C 
	  ============================

Shape analysis in FISh allows polymorphic functions to be specialised
to the exact shape of their arguments, so that it is never necessary
to box data structures. This can have a significant impact when the
cost of pointer-chasing is high compared to the operation being
performed.

This is illustrated by polymorphic quicksort. The FISh program looks
like a standard recursive functional program of type [a] -> [a]. Here
is the corresponding C code using C's standard polymorphic quicksort
in C, called qsort. FISh takes less than half the time on large arrays
of integers. Details of the tests can be found in the Benchmarks page.

The reason is that the C program achieves polymorphism by requiring
the comparison function to act on pointers, rather than values. The
actual comparison function for floats is

int comp(const void *i, const void *j) {
int res ;
if (*(double*)i - *(double*)j > 0.0 )
{res = 1 ;}
else {res = -1 ;}
return res;
}

As well as slowing down the program, these pointers are a source of
program clutter, potential errors, and
type violations. 


From: "Ralph Loader" <ralph.loader@paradigm.co.nz>
Date: Thu, 29 Oct 1998 13:46:02 +0000
Subject: categories: Re: Update on FISh and shape


> 
> Polymorphic quicksort in FISh is faster than in C for quicksort (see below).

Some comments.

You're simply not comparing like-with-like.  The differences in 
performance that you're seeing have nothing to do with the boxing/ 
unboxing issues that you claim are involved.

(1) Your C code compares using two floating point operations 
(subtraction, compare), while the fish program uses just one (no 
subtraction).

(2) The output of the fish-to-C translation uses statically allocated 
memory, and won't multi-thread correctly.  Sun Solaris does support 
multi-threading, so while your compiler uses statically allocated 
memory here, the Solaris C library can't.  [As a matter of interest, 
what happens if you call your fish quicksort with a comparison 
function that itself calls quicksort?].

(3) The C library qsort takes a three-valued comparison function, 
while your fish code uses a two-valued comparison.

> Shape analysis in FISh allows polymorphic functions to be specialised
> to the exact shape of their arguments, so that it is never necessary
> to box data structures. This can have a significant impact when the
> cost of pointer-chasing is high compared to the operation being
> performed.

["never"?  Show me an unboxed cyclic data structure.]

(4) As you note, your fish code gets optimised by specialising to the
datatype and comparison function used.  This is the main 
performance issue, and has nothing to do with boxing/unboxing issues.
Note that the C library qsort is a polymorphic function, is not 
specialised to particular arguments, and still takes an array of 
unboxed elements.

Specialising functions to their arguments is nothing new to FISh - 
some (but not all) C compilers, for instance, on inlining a function 
will also specialise it.  The reason why the qsort function is not
specialised to its arguments is because it's a precompiled library 
function, not because it's written in C.

> The reason is that the C program achieves polymorphism by requiring
> the comparison function to act on pointers, rather than values. The
> actual comparison function for floats is

The reality is that the machine code generated by your fish program 
ends up dealing with pointers also: on a typical RISC processor, to 
compare two doubles held in memory, you need to compute the addresses 
of the doubles, load them into registers and do the comparison.

Passing pointers to the comparison function just means that the 
values are loaded into registers in the function, rather than before 
the function call.  Whether its faster to load the values into 
registers before the function call or in the function will depend on 
things like instruction scheduling in the compiler & CPU, & register 
allocation in the caller & in any case the difference will be tiny.

> As well as slowing down the program, these pointers are a source of

It's not very often that you want to sort an array of numbers.  If 
you want to sort some records that contain both a sort-key and some 
extra data, then using pointers is faster.

> program clutter, potential errors, and
> type violations. 
> 
> 
> 
> 

Ralph.

(not speaking on behalf of Paradigm)


Ralph Loader
Paradigm Technology, Level 13, Paxus House,
79 Boulcott Steet, Wellington, New Zealand
Phone: +64 4 495 1004    Fax: +64 4 499 7762


Date: Mon, 2 Nov 1998 16:46:48 +1100 (EST)
Subject: categories: Re: FISh performance 
From: Barry Jay <cbj@socs.uts.edu.au>


Dear Ralph,

thank you for taking an interest in the FISh performance results for
quicksort. You raise two basic questions. Is the FISh program faster?
If so, why?


Is quicksort in FISh faster than qsort (on arrays of doubles)? 
==============================================================


> (1) Your C code compares using two floating point operations 
>(subtraction, compare), while the fish program uses just one (no 
>subtraction).


This makes a small difference to the overall performance of the
algorithm.  I re-ran the test for qsort with the following comparison
function

int comp(const void *i, const void *j) {
return (*(double*)i > *(double*)j ) ;
}

that has only one operation. The C results are better by about 10% but
FISh is still twice as fast.

	C benchmark1 	C benchmark2 	FISh benchmark

	3.59		3.35		1.65

>(3) The C library qsort takes a three-valued comparison function, 
>while your fish code uses a two-valued comparison.

I'm not sure what point you're making. The designers of qsort had a
free choice. I suspect that the three-valued comparison is used to
exploit equality of values, to obtain greater efficiency. In any
event, it is unlikely to make much difference in the light of (1)
above. 

Conclusion: The FISh program *is* faster. 


Why is quicksort in FISh faster?
=================================

>(4) As you note, your fish code gets optimised by specialising to the
>datatype and comparison function used.  


Yes. This is a given. The question is to pinpoint the source of the
speedup.


> ... This is the main 
>performance issue, and has nothing to do with boxing/unboxing issues.


Point taken. The data supplied to quicksort is not boxed, but this is
because (the length of the array and) the size of the array entries
are given as explicit parameters. In this sense qsort is *not*
polymorphic in its choice of data type. If it were not provided by
the C programmer then boxing would add another layer of indirection to
the program. Such information about the size of entries etc. is
inferred by the FISh compiler.

>... Whether [FISh is faster] will depend on 
>things like instruction scheduling in the compiler & CPU, & register 
>allocation in the caller & in any case the difference will be tiny.


Your conclusion does not match the performance figures above.

The FISh program avoids making a function call at all.  That is,
instead of passing the addresses to a function, a primitive comparison
is made. The key point is that this can only be done because
information like the length of the array and the size of the entries
(in bytes) can be inferred by the FISh compiler, instead of being
supplied by the user.


Yours,
Barry Jay


*************************************************************************
| Associate Professor C.Barry Jay, 					|
| Reader in Computing Sciences		Phone: (61 2) 9514 1814		|
| Head, Algorithms and Languages Group,	Fax:   (61 2) 9514 1807		|
| University of Technology, Sydney,	e-mail: cbj@socs.uts.edu.au	|
| P.O. Box 123 Broadway, 2007,	  http://www-staff.socs.uts.edu.au/~cbj	|
| Australia.			        FISh homepage ... ~cbj/FISh     |
*************************************************************************


Date: Tue, 03 Nov 1998 15:41:11 +1100
From: Robbie Gates <robbie@maths.usyd.edu.au>
Subject: categories: Re: FISh performance

Warning: there's no category theory in this letter - it's purely a
reponse to the discussion of execution speed of FISh quicksort.

Barry Jay wrote:
> Why is quicksort in FISh faster?
> =================================

I would like to propose another reason the FISh quicksort appears to be
faster than the library qsort - they aren't the same algorithm.  In
particular, the standard library qsort and the FISh sort are using
different methods to pick the pivot.

Choice of pivot in quicksort is important to ensure the worst case
doesn't degenerate to O(n^2).  The choice of pivot in the FISh code
is always the first element, which does give very poor performance
in the case the array is already sorted.  I'm not sure what the qsort
in the libc i have here does.  However, typical tricks include selecting
the median of three elements at the bottom, middle and top of the array,
choosing a random element, choosing the median of three random elements,
and so on.  Importantly, each of these choices requires some work,
and thus negatively impacts on the average speed of the sort in order to
alleviate the worst case speed.

Caveat: I'm not a professional benchmarker/tester, and the tests below
are probably very flawed.  I have tried to give details on everything
that seems relevant, but your mileage may vary.  This data is more given
to suggest an alternative, plausible explanation for the observed speed
difference in the FISh and C benchmarks, than to claim i have identified
all the issues in the timing results found for FISh and C quicksort.

To test this hypothesis, i coded up quicksort in C++ using the same
pivot picking strategy as the FISh code.  I then modified my C++ code,
and the C benchmark code and C translated FISh code (CTFC) from the
FISh website (by CTFC i mean the contents of test01.c), by altering
the random number generator to allow specification of two command line
arguments giving the multiplier and step for the seeding algorithm.
i.e rather than
seed = seed * 25173 + 17431
i used
seed = seed * mult + 17431
for varying mult.

This allowed me to test the extent to which the algorithm was sensitive
to the data being sorted.  What follows are timing values on a DGUX
AlphaStation 255 (233 MHz) with mult = 1,2,3, ... ,10 (on an array of
10000 values).  For each program, i give the total (user + system)
time reported by the builtin time function of bash version
2.02.0(1)-release (alpha-dec-osf4.0) (averaged over 10 trials in
each case). The time columns are libqsort (= the standard library
qsort), cppqsort (= my C++ qsort), fishqsort (= CTFC). They were
compiled with gcc, g++, gcc in each case, and -O optimization.

NOTE: With these mult values, the data supplied to the qsort is
deliberately nonrandom to try and detect the effect proposed in
the second paragraph.

mult         libqsort        cppqsort     fishqsort
1              .0350          3.462          3.410
2              .0348          4.039          4.039
3              .1003          .0334          .0391
4              .0350          4.085          4.052
5              .1130          .0332          .0393
6              .0356          4.042          4.039
7              .1025          .0341          .0382
8              .0357          4.061          4.054
9              .1088          .0336          .0382
10             .0355          4.053          4.041

Observe that the libqsort has much better average performance (and
that its profile is different to the other two - one could
probably determine something about the choice of pivots with enough
tests ;-).  Also, the CTFC at best beats the library by
a factor a bit under 3, whereas the library beats the CTFC
by a factor of over 110.

More importantly, the C++ and fish code run in very similar time,
neither consistently beating the other.  Also, their profiles are
almost identical.

I also ran one bigger test - in the case of a 90000 element array
consisting of 1 to 90,000 in order, the library took around .3
seconds, whereas the CTFC overflowed the stack and crashed
just under 200 seconds into the test (the C++ code took
315 seconds to overflow the stack and crash - i'm not sure what's
causing the difference here, but i suspect it has to do with
the fact my C++ code used paramaters rather than a static struct).

Based on this, I'm proposing that the standard library quicksort is
trading off speed for the average cases to bring the worst cases down
to something acceptable. This is presumably a serious issue for a
library quicksort - its better to run on average somewhat slower in
order to get consistency, as this makes applications easier to test
for market acceptance of speed, rather than the application apparently
hanging when the user manages to sort the worst case database with about
1 million entries in it (when your testers happily sorted their million
entry db's with no trouble).

one other point - the compile time polymorphism is not specific
to FISh (as Ralph Loader mentioned). You write:
> The FISh program avoids making a function call at all.  That is,
> instead of passing the addresses to a function, a primitive comparison
> is made. The key point is that this can only be done because
> information like the length of the array and the size of the entries
> (in bytes) can be inferred by the FISh compiler, instead of being
> supplied by the user.

C++ templates also give the ability to supply a compile time comparision
function to sort routines, the code will likewise be specialised on
the given comparison and on the size of the entries.  One could write
sort templates that take the length as well, however since the size
of stuff to be sorted typically varies depending on the current
application state (i.e. in a manner unknown at compile time) this tends
not to be enough of a win to be worth doing.

 - robbie
-- 
*--------------------------------> + <---------------------------------*
|_|          robbie gates          |         pgp key available       |_|
V          category theorist       V  //cat.maths.usyd.edu.au/~robbie  V
*--------------------------------> + <---------------------------------*