[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: REDUCE
Richard Younger wrote:
>
> John-David Smith wrote:
> >
> > Kenneth Mankoff wrote:
> > >
> > > >The question, to all you C-programmers: is there a better way?
> > > [snip]
> > > >...the code logic to compute the maximum will be the same, both
> > > >symbolically for all types for many types, in the compiled code itself.
> > >
> > > Hi JD,
> > >
> > > hmmm... not 100% sure, but wouldn't c++ templates solve this problem?
> > >
> > > And for the cases where it is "symbolically" the same but not "compiled
> > > the same", I'm not sure what this means, but I'm guessing you would handle
> > > these cases with overloading your operators.
> > >
> > > Of course, C isn't C++, so this might not help.
> > >
> > > I can provide code examples and more info if you wish.
> >
> > Thanks for the suggestion. I had thought of that option, but I don't know much
> > about templates, nor about linking C++ to IDL. I wonder whether the templates
> > are just similar to my super macro for creating a different version for each
> > type. Can you frame the maximum function I suggested in terms of a skeleton
> > template which would operate on all the data types?
> >
> > My comment with respect to compiled and symbolic maybe wasn't clear. I really
> > just meant that you have this same code replicated over and over, with minor
> > changes in the types of the variables used, but otherwise logically and
> > symbolically intact. I can imagine the compiler emitting different code for,
> > e.g., multiplying two integers, vs. two floats, but I can also imagine other
> > types where the codes emitted are exactly the same. Obviously, you can't get
> > something for nothing, but if real repition exists within the compiled code, you
> > should be able to eliminate it somehow.
> >
> > Thanks again,
> >
> > JD
>
> Hi JD and Ken,
>
> I agree with Ken that the most obvious and easiest C++ solution is
> templates and operator overloading. I have a little experience DLMing
> with C++, and it works just fine. The calling conventions of C and C++
> can be set exactly the same, so the limitations are exactly the same.
>
> What is the distinction between the different cases? Are you primarily
> worried with arithmetic, indirection, or member changes. e.g:
>
> (float a * 2) vs (int a * 2) or
> (float*)a vs (int*)a or
> 2*value.f vs 2*value.i
>
> The overloading and template solutions work well on the first two
> problems, and not well at all on the last category, because AFAIK
> there's no good, compact way to make run-time distinctions with
> members. It's because different explicit symbols are used, as opposed
> to different implicit types. You end up using lots of switch - case
> statements. I suppose you could put the switch into an operator to
> extract the value of a data element, but then you end up switching every
> time you access an array element, instead of once at the beginning. I'd
> think it would be slower than your super-macro. Maybe someone else
> knows a better solution.
The first two things are what I'm concerned about, with indirection
modified to include declaration. Here's an example of post-preprocessor
code for threading the "max" operation (cleaned up a fair bit):
if(maxQ) {switch( type ) {
case IDL_TYP_BYTE:
{
UCHAR *tin,*tout,tmp;
tout=( UCHAR *)out;
tin=( UCHAR *)arg[0]->value.arr->data;
for(i=0,base=0;i<new_nel;base+=skip) {
for(j=0;j<atom;j++) {
tmp=tin[j+base];
for(ind=j+base;ind<j+base+atom*n_cdim;ind+=atom) {
if(tin[ind]>tmp)tmp=tin[ind];
}
tout[i++]=tmp;
}
}
}
break;
case IDL_TYP_INT:
{
short *tin,*tout,tmp;
tout=( short *)out;
tin =( short *)arg[0]->value.arr->data ;
for(i=0,base=0;i<new_nel;base+=skip) {
for(j=0;j<atom;j++) {
tmp=tin[j+base];
for(ind=j+base;ind<j+base+atom*n_cdim;ind+=atom) {
if(tin[ind]>tmp)tmp=tin[ind];
}
tout[i++]=tmp;
}
}
}
break;
case IDL_TYP_LONG: {
....
And it goes on and on for all 9 types. "out" is the result of an
IDL_MakeTempArray() call, and needs to be cast correctly, as does the
input array data. A tmp variable of the correct size is also
initialized. This is the only difference in all 9 version of the
generated code. Granted, this is a very simple example, but what I am
looking for is a solution which makes use of the redundancy in this code
to avoid generating most of it. I may be asking more out of compilers
than they can offer.
I think what C++ templates would do is basically the same thing I'm
doing, but in a much cleaner way (i.e. not using ugly nested macros).
That is, it would "instantiate" a different version of my looping
max-finding function 9 times, and the code would bloat just as much.
This isn't a big deal for this little function, but imagine a very large
template function being duplicated 9 (or 18) times. I'm beginning to
suspect there's no real way around this.
> On a side note, Your REDUCE package seems to be very similar to a
> feature that I really would like RSI to implement; namely,
> Einstein-summation or dummy-index notation. Something that would result
> in operations like
>
> epsilon = fltarr(3, 6, 9)
> E_one = fltarr(9)
> E_two = fltarr(6)
>
> epsilon[%1, %2, %3]*E_one[%3]*E_two[%2]
>
> would multiply the elements of epsilon by E_one on the corresponding
> (3rd) index and sum over that index. Especially for those of us working
> with lots of fields in tensor notation, it would save lots of for loops,
> and I'm sure that a built-in facility would save time over the for loop.
>
I.e. an implicit double sum over the second and third indices? I
wouldn't do this with for loops. I would use rebin, and total, like:
res=total(total(epsilon*rebin(reform(e_one,1,1,s[2]),s)* $
rebin(1#e_two,s),1),2)
Admittedly, your notation is somewhat cleaner. If these are confusing,
see my tutorial (to be posted) concerning rebin+reform in action.
JD