You are not logged in.

Dear visitor, welcome to Palo Community Forum. If this is your first visit here, please read the Help. It explains in detail how this page works. To use all features of this page, you should consider registering. Please use the registration form, to register here or read more information about the registration process. If you are already registered, please login here.

pvanderm

Intermediate

  • "pvanderm" is male
  • "pvanderm" started this thread

Posts: 13

Date of registration: Apr 20th 2010

  • Send private message

1

Friday, October 14th 2011, 7:35am

PaloJLib - Slow getChildren call on IElement and potential stability issues

The getChildren call can be very slow on the IElement object.

I'm attaching an example where I compare the speed of the PaloJLib to the old JPalo library.

When adding 5000 items in 500 groups, JPalo takes around 5s, whereas PaloJLib takes around 40s.
When adding 10000 items in 1000 groups, JPalo takes around 10s, whereas PaloJLib mostly takes around 150s (but could also be as bad as 1314s). It is worth noting that the results are not linear (as in JPalo's case).

My example is loosely based on how the Kettle plugin works. The script is a mock-up.

There also seems to be a stability issue with PaloJLib. Run the example a couple of times and you will see that JPalo's results are pretty constant. When testing with 10 000 rows, PaloJLib took 150s, 1314s, 150s, 570s and finally 150s again. If I run JPalo and PaloJLib together, my PC starts freezing up and the scripts never returns on PaloJLib. I'm not sure where the issue is, at the Palo end, JPalo or PaloJLib. Either way, if you can reproduce it, you may find it useful.

To run the example you need the following libraries:
jpalo.jar
palojlib-1.0.35.jar

log4j-1.2.14.jar
pvanderm has attached the following file:
  • SpeedTest.zip (1.64 kB - 182 times downloaded - latest: Yesterday, 5:37pm)
Kind Regards,
Pieter van der Merwe
De Bortoli Wines

kais

Master

Posts: 64

Date of registration: Mar 4th 2009

  • Send private message

2

Friday, October 14th 2011, 10:45am

Running getElements depends on the parameter withAttributes. So it depends how you use the palojlib.

To compare the palojlib with Jpalo, it is better to use the etlserver to run the tests, because this library is designed for the etl server in the first place, in 3.2 SR2, there is the possibility to use both libraries (take a look at the whats_new pdf given in that version).

regarding parallel runs, as you will see in the version 3.3 that we come soon, Palojlib will be used to run parallel loads in Palo server, so the Palojlib itself has no problem with it, but running Jpalo and palojlib together, is a different issue, that we will not address.

The other thing, the attached zip SpeedTest.zip is empty, maybe you can reattach your examples and then I can take a look.

tish1

Sage

Posts: 761

Date of registration: Jul 13th 2009

Location: Vienna / Austria

Occupation: Senior Consultant @ Vector SW DV GmbH

  • Send private message

3

Friday, October 14th 2011, 12:44pm

Hi,

> in 3.2 SR2, there is the possibility to use both libraries (take a look at the whats_new pdf given in that version).

how can I configure ETL Server to make use of jPalo and PalojLib?

Regards.

pvanderm

Intermediate

  • "pvanderm" is male
  • "pvanderm" started this thread

Posts: 13

Date of registration: Apr 20th 2010

  • Send private message

4

Friday, October 14th 2011, 2:05pm

Attachment

Sorry, something must have gone wrong with the upload, please try the attached file.

I'm trying to convert from jpalo to palojlib in the Kettle project (http://kettle.pentaho.com/) and so far everything is going well, except for the speed on calling any function involving children like getChildren and getChildCount. I've moved away from getChildCount because it is so slow, but I can't get away from calling getChildren.

If I need to change my logic to avoid a bug or avoid a performance hit, it can be done. Any suggestions on logic will be appreciated. At the moment I'm just replacing function calls using jpalo to palojlib. The same logic as you will see in the example is much faster with jpalo. A change in logic may fix it, but I can't see why it should not work.

When I get elements I don't retrieve attributes (ie the parameter is set to false), but I can check it to see if it makes a difference.

I'm not trying to run jpalo/palojlib together. As you will see in the code, I'm running the one, closing the connection and then running the other one.

(EDIT: My zip file keeps being invalidated. Maybe it is removing the .java file from the zip since it isn't allowed? I've uploaded the java file as a text file.)
pvanderm has attached the following file:
  • SpeedTest.txt (6 kB - 160 times downloaded - latest: May 17th 2013, 11:33pm)
Kind Regards,
Pieter van der Merwe
De Bortoli Wines

kais

Master

Posts: 64

Date of registration: Mar 4th 2009

  • Send private message

5

Friday, October 14th 2011, 5:16pm

Ok I will take a look at it next week, but just for info, the porblem in jpalo, that it did not notive changes done outside jpalo through it's execution. So if you check the children of a certain element and then change something in olap server then check the children again, jpalo will not notice this (this can be easily tested when debugging). In palojlib, we tried always to validate what in the server really at this time and invalidate our cache if needed. This may be the problem. But as I said, I will take a look as soon as possible.

@vector you can either use palojlib or jpalo in 3.2 SR2 and not both at the same time.

kais

Master

Posts: 64

Date of registration: Mar 4th 2009

  • Send private message

6

Monday, October 17th 2011, 3:36pm

I checked your code, as I expected the problem is that palojlib tries to validate it's cache each time the API is used, jpalo updates it's cache, which is faster but has the disadvantage that it will not consider changes done on the olap server from other clients e.g. excel addin,when the changes are done during the jpalo execution.

If I delete this from 2 funtions

if (slowDown)
groupElem.getChildren();

then with jpalo, I get an average time of 50 seconds and in palojlib 55 seconds, (I did not notice any very big difference in execution time in palojlib, the times where 57.192,55.526,55.220).
But to solve, you have to avoid opration like
R(read),W(write),R,W,R,W,R,W,R,W,R,W,....

and try to make it,

R,R,R,R,R,R,R,W,W,W,W,W,W,W,W,.. or
W,W,W,W,W,W,R,R,R,R,R,R,R,R,R,...

kais

Master

Posts: 64

Date of registration: Mar 4th 2009

  • Send private message

7

Monday, October 17th 2011, 3:38pm

here is your example, but using bulk operations and avoiding R,W,R,W ....
kais has attached the following file:
  • function.TXT (1.82 kB - 98 times downloaded - latest: May 12th 2013, 5:20am)

tish1

Sage

Posts: 761

Date of registration: Jul 13th 2009

Location: Vienna / Austria

Occupation: Senior Consultant @ Vector SW DV GmbH

  • Send private message

8

Wednesday, October 26th 2011, 12:24pm

you can either use palojlib or jpalo in 3.2 SR2 and not both at the same time
Thanks. That's what I thought, but that one made me wonder:
If I run JPalo and PaloJLib together
But obviously I misunderstood something. ;-)

Regards.

pvanderm

Intermediate

  • "pvanderm" is male
  • "pvanderm" started this thread

Posts: 13

Date of registration: Apr 20th 2010

  • Send private message

9

Wednesday, October 26th 2011, 2:16pm

@vector. Yes, sorry, I should have said in the same program. I run the one, close the connection and then run the other one. I didn't run them "together" as in parallel, but I used them both in one program, sequentially.

@kais, I changed the logic in the Kettle Plugin to do RRRR, and then WWW and PaloJLib now performs as expected. Loading 14 000 items with 4 levels of consolidation (25 000 items if you take the consolidations into account) now takes around 30s to load.
Kind Regards,
Pieter van der Merwe
De Bortoli Wines

tish1

Sage

Posts: 761

Date of registration: Jul 13th 2009

Location: Vienna / Austria

Occupation: Senior Consultant @ Vector SW DV GmbH

  • Send private message

10

Wednesday, October 26th 2011, 3:24pm

Loading 14 000 items with 4 levels of consolidation (25 000 items if you take the consolidations into account) now takes around 30s to load.
Sounds great!

Similar threads

Rate this thread