[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: REDUCE

Subject: Re: REDUCE
From: JD Smith <jdsmith(at)astro.cornell.edu>
Date: Sun, 01 Apr 2001 18:01:19 -0400
Newsgroups: comp.lang.idl-pvwave
Organization: Cornell University
References: <3AC3D0C1.353426DF@astro.cornell.edu> <Pine.LNX.4.10.10103291840350.13746-100000@snoe.colorado.edu> <3AC41A61.B65F4ADE@astro.cornell.edu> <3AC779A3.D6508F46@ll.mit.edu>
Sender: verified_for_usenet(at)cornell.edu (jts11 on vodka.tn.cornell.edu)
Xref: news.doit.wisc.edu comp.lang.idl-pvwave:24272

Richard Younger wrote:
> 
> John-David Smith wrote:
> >
> > Kenneth Mankoff wrote:
> > >
> > > >The question, to all you C-programmers:  is there a better way?
> > > [snip]
> > > >...the code logic to compute the maximum will be the same, both
> > > >symbolically for all types for many types, in the compiled code itself.
> > >
> > > Hi JD,
> > >
> > > hmmm... not 100% sure, but wouldn't c++ templates solve this problem?
> > >
> > > And for the cases where it is "symbolically" the same but not "compiled
> > > the same", I'm not sure what this means, but I'm guessing you would handle
> > > these cases with overloading your operators.
> > >
> > > Of course, C isn't C++, so this might not help.
> > >
> > > I can provide code examples and more info if you wish.
> >
> > Thanks for the suggestion.  I had thought of that option, but I don't know much
> > about templates, nor about linking C++ to IDL.  I wonder whether the templates
> > are just similar to my super macro for creating a different version for each
> > type.  Can you frame the maximum function I suggested in terms of a skeleton
> > template which would operate on all the data types?
> >
> > My comment with respect to compiled and symbolic maybe wasn't clear.  I really
> > just meant that you have this same code replicated over and over, with minor
> > changes in the types of the variables used, but otherwise logically and
> > symbolically intact.  I can imagine the compiler emitting different code for,
> > e.g., multiplying two integers, vs. two floats, but I can also imagine other
> > types where the codes emitted are exactly the same.  Obviously, you can't get
> > something for nothing, but if real repition exists within the compiled code, you
> > should be able to eliminate it somehow.
> >
> > Thanks again,
> >
> > JD
> 
> Hi JD and Ken,
> 
> I agree with Ken that the most obvious and easiest C++ solution is
> templates and operator overloading. I have a little experience DLMing
> with C++, and it works just fine. The calling conventions of C and C++
> can be set exactly the same, so the limitations are exactly the same.
> 
> What is the distinction between the different cases?  Are you primarily
> worried with arithmetic, indirection, or member changes.  e.g:
> 
>         (float a * 2)   vs  (int a * 2) or
>         (float*)a       vs  (int*)a     or
>         2*value.f       vs  2*value.i
> 
> The overloading and template solutions work well on the first two
> problems, and not well at all on the last category, because AFAIK
> there's no good, compact way to make run-time distinctions with
> members.  It's because different explicit symbols are used, as opposed
> to different implicit types.  You end up using lots of switch - case
> statements.  I suppose you could put the switch into an operator to
> extract the value of a data element, but then you end up switching every
> time you access an array element, instead of once at the beginning.  I'd
> think it would be slower than your super-macro.  Maybe someone else
> knows a better solution.

The first two things are what I'm concerned about, with indirection
modified to include declaration.  Here's an example of post-preprocessor
code for threading the "max" operation (cleaned up a fair bit):

if(maxQ) {switch( type ) { 
 case IDL_TYP_BYTE: 
   { 
     UCHAR *tin,*tout,tmp;
     tout=( UCHAR *)out;
     tin=( UCHAR *)arg[0]->value.arr->data;
     for(i=0,base=0;i<new_nel;base+=skip) { 
       for(j=0;j<atom;j++) { 
	 tmp=tin[j+base]; 
	 for(ind=j+base;ind<j+base+atom*n_cdim;ind+=atom) { 
	   if(tin[ind]>tmp)tmp=tin[ind]; 
	 }
	 tout[i++]=tmp; 
       } 
     }
   }
   break; 
 case IDL_TYP_INT: 
   { 
     short *tin,*tout,tmp;
     tout=( short *)out;
     tin =( short *)arg[0]->value.arr->data ;
     for(i=0,base=0;i<new_nel;base+=skip) { 
       for(j=0;j<atom;j++) { 
	 tmp=tin[j+base]; 
	 for(ind=j+base;ind<j+base+atom*n_cdim;ind+=atom) { 
	   if(tin[ind]>tmp)tmp=tin[ind]; 
	 } 
	 tout[i++]=tmp; 
       } 
     } 
   }
   break; 
 case IDL_TYP_LONG: { 
  ....


And it goes on and on for all 9 types.  "out" is the result of an
IDL_MakeTempArray() call, and needs to be cast correctly, as does the
input array data.  A tmp variable of the correct size is also
initialized.   This is the only difference in all 9 version of the
generated code.  Granted, this is a very simple example, but what I am
looking for is a solution which makes use of the redundancy in this code
to avoid generating most of it.  I may be asking more out of compilers
than they can offer.  

I think what C++ templates would do is basically the same thing I'm
doing, but in a much cleaner way (i.e. not using ugly nested macros). 
That is, it would "instantiate" a different version of my looping
max-finding function 9 times, and the code would bloat just as much.  
This isn't a big deal for this little function, but imagine a very large
template function being duplicated 9 (or 18) times.  I'm beginning to
suspect there's no real way around this.

> On a side note, Your REDUCE package seems to be very similar to a
> feature that I really would like RSI to implement; namely,
> Einstein-summation or dummy-index notation.  Something that would result
> in operations like
> 
>         epsilon = fltarr(3, 6, 9)
>         E_one   = fltarr(9)
>         E_two   = fltarr(6)
> 
>         epsilon[%1, %2, %3]*E_one[%3]*E_two[%2]
> 
> would multiply the elements of epsilon by E_one on the corresponding
> (3rd) index and sum over that index. Especially for those of us working
> with lots of fields in tensor notation, it would save lots of for loops,
> and I'm sure that a built-in facility would save time over the for loop.
> 

I.e. an implicit double sum over the second and third indices?  I
wouldn't do this with for loops.  I would use rebin, and total, like:

res=total(total(epsilon*rebin(reform(e_one,1,1,s[2]),s)*  $
	        rebin(1#e_two,s),1),2)


Admittedly, your notation is somewhat cleaner.  If these are confusing,
see my tutorial (to be posted) concerning rebin+reform in action.

JD

Follow-Ups:
- Re: REDUCE
  - From: Richard Younger

References:
- REDUCE
  - From: JD Smith
- Re: REDUCE
  - From: Kenneth Mankoff
- Re: REDUCE
  - From: John-David Smith
- Re: REDUCE
  - From: Richard Younger

Prev by Date: Re: REDUCE
Next by Date: Dimensional Juggling Tutorial
Prev by thread: Re: REDUCE
Next by thread: Re: REDUCE
Index(es):
- Date
- Thread