Hi all, thanks very much for your replies :-)
Yes, all CPUs are utilized equally, but the GPU-approach would be very expensive (I read before,just the GPU-subsystem could cost up to 50T€...). The rule-cache size made a small improvement when turned it higher, but we have no idea what could be the best value to use... I expected, that all hardware resources would be used until 100% utilized and therefor be the limiting factor, but as we learn now, it is not.
To avoid a huge amount of data and rules was my aim also, but the customer insisted that he really need the multi-step-approach to calculate the reported values at this granularity at all -> maybe I need to propose really the GPU-solution, hopefully they remember their priorities then ;-) We precalculate as possible in ETL, but then 3 (and up to 6 next stage) cubes load the data and calculate then the next interim results and these then are combined in the master cube which is the base for the reports. And besides we have 2 lookup cubes for categorization purposes, one of them has 10K elements on the largest dim. At all, this project would be moreover a case for TM1, but they prefer open source and wanted to give PALO a try...
Our frontend is Jedox Web (spreadsheets), but not dedicated Excel via AddIn, this I only use for development purposes (Adv. Rule Editor mostly).
The retrieved data are numbers, not strings. But many palo.data-rules (accessing to data/interim results from upstream stage cubes) use string formulas (I need to calculate a %-value, round it to 1 digit after comma and match it then to a lookup-cube, which gives back a category number. And in two cases a >= and <= check results in a result stored as a string resp. an attribute is pulled into result cube besides a dimension value, because later I want it to mark in a traffic light style.
So 3 different calculations are done and matched to a category (Interim Calculation cubes A-C with matching to Category Lookup Cube). Then these 3 interim results are referenced via palo.data-rules into the Master Cube, which itself makes a lookup to the Value Weight Cube and calculates on the 3 weighed interim values a master value. The master cube contains also one more measure which is gained by dump summing up, not by cube references.
Amount of retrieved data:
Cube/type of value Value from Cube properties (Modeller)
Size A-C (structure identically) 44460976290228000
Interim Calculation Cube A filled cells 41230
Interim Calculation Cube B filled cells 20457
Interim Calculation Cube C filled cells 60604
Master Cube Size 277881101813925000
Master Cube filled cells 92837
Category Lookup Cube Size 1887510
Category Lookup Cube filled cells 1851
Value Weight Cube Size 170
Value Weight Cube filled cells 38
There is very much sparsity, so I tend to try markers now, but as I read in the advanced Excel manual, this is useful only when I have a calculation and one component of the formula could be 0.
But how fits this to the palo.data-rule?
There is also shortly described @palo.marker, but could i simply pin this on my several palo.data-rules and use the compete set of dimension names like in the example in the manual?
Or do I need to marker only subsets of the source cubes? Does it work exactly along the same concept as TM1's Feeders do?
For the productive phase planned in the near future, we plan to load more data, approx. 3 times compared to now, which will substitute the sparsity space to filled space, because the dimensional structures are complete right now.
Much stuff, I apologize for that, but I hope thus I can forward the appropriate information for a helpful hint

, which would be VERY appreciated
Best regards,
Stefan