• Multiwfn forum

    Multiwfn official website: http://www.shanxitv.org/multiwfn. Multiwfn forum in Chinese: http://bbs.keinsci.com/wfn. E-mail of admin: sobereva[at]sina.com

    You are not logged in.

    #1 2022-12-15 13:31:45

    i.s.ger
    Member
    Registered: 2020-12-01
    Posts: 55

    Patch: omp collapse(2) in grid.f90

    Dear Tian,

    As I mentioned on http://www.shanxitv.org/wfnbbs/viewtopic.php?id=732 topic, I found a way to have a speed-up.

    Here, the patch is presented. The patch is affected for machines with a large number of threads. Probably, a similar patch can be applied through the whole code.

    Multiwfn_collapse.patch.txt

    The effect of the patch I tested on 704atoms.wfn. Here, the speed-ups are presented for a different number of cores. The black line means ideal scale. After the patch, the ideal scale is up to 26 cores, while before only up to 19 (?).

    collapse.png

    Probably, for better scalability, I need a larger system (or a slower computer) since even for code without collapse, near 32 cores, time became about 5 seconds, and for `collapse(2)`, time became about 3 seconds for 32 cores.

    Best regards,
    Igor

    Offline

    #2 2022-12-16 06:02:06

    sobereva
    Tian Lu (Multiwfn developer)
    From: Beijing
    Registered: 2017-09-11
    Posts: 1,468
    Website

    Re: Patch: omp collapse(2) in grid.f90

    Dear Igor,

    Thanks, I'll check and test shortly. I just infected with COVID-19 and my productivity has been greatly affected, so it may take longer time for me to give you reponse...

    Best regards,

    Tian

    Offline

    #3 2022-12-18 09:38:33

    sobereva
    Tian Lu (Multiwfn developer)
    From: Beijing
    Registered: 2017-09-11
    Posts: 1,468
    Website

    Re: Patch: omp collapse(2) in grid.f90

    Dear Igor,

    collapse(2) is really fantastic! Your patch has been merged into official source code.

    I tested 704atoms.wfn on my dual AMD EPYC 7R32 (96 physical cores) server, the costs using new version for calculating high quality grid data of electron density and ELF are 2s and 6s, respectively. While the costs using old version are 5s and 20s. The speed-up by collapse(2) on the server with large number of cores is surprisingly high!

    However, I removed "if(mod(ifinish,256)==0)", otherwise after calculation I will observe

    Calculation of grid data took up wall clock time         2 s-]   99.89 %     /

    Namely the progress bar is not 100%. My brief test showed that removing "if(mod(ifinish,256)==0)" doesn't detectably hurt performance, at least on my 8-core notebook and 96-core server.

    Best regards,

    Tian

    Offline

    Board footer

    Powered by FluxBB

    久久精品国产99久久香蕉