(Day 1) Setup and Craypat

This is John Levesque - is this being posted?

Just was not sure I was adding to hackpad - guess I am

By posted what do you mean?  I was thinking we could clean this up and have a PDF of the crowdsourcing.

Here is the link to the tutorial files: 

https://github.com/olcf/OpenACC_workshop_072013

We can use this as a sort of notes/chat room so feel free to type!

Google Hangout is a little nicer... :)  This is simple enough.

Haters! :)  I’ll setup a google hangout then? Well, this can be notes too...

http://goo.gl/UZ1Fw  I didn’t know about Knight’s Landing. Interesting.

I just joined it. What did I miss the first meeting? I had big trouble setting up webex on Fedora 18 linux. Is there a schedule?

Hi, here’s the info: https://www.olcf.ornl.gov/training-event/openacc-tutorials/

Is this more like an "informal" tutorial/hands-on session with no pre-set time table? OK. I will check out the files.

Possible questions for the OpenACC instructors:

can we see the terminal screen on the webinar view? 

I second this...

Let me get on that...I’ll post some things here in the mean time

Script for running on Chester

#!

I /bin/bash

#PBS -N himeno 

#PBS -l nodes=16

#PBS -l walltime=1:00:00

#PBS -j oe

#PBS -A TRN001

cd $PBS_O_WORKDIR

date

aprun -n 64  -N 4 ./himeno_orig

Commands for running CrayPat

module swap PrgEnv-pgi PrgEnv-cray   (this was successful)

cd into /lustre/scratch (this was successful)

mkdir username (not successful - permission denied - already created username)

cd username (successful)

put himeno_workshop.tgz (cp’d over successful)

untar (successful)

cd himeno_step1 (successful)

module load perftools   (successful)

ftn -rm -eF himeno_BMTxpr.f -o himeno_orig (successful, but received warning)

(WARNING: CrayPat is saving object files from a temporary directory into directory ’/ccs/home/username/.craypat/himeno_orig/1118’)

pat_build -u -g mpi ~/full path/himeno_orig (successful)

Debugging Craypat

Helpful Craypat commands

username@chester-login1:/lustre/scratch/username/himeno_step1> pat_build -u -g mpi /lustre/scratch/csep27/himeno_step1/himeno_orig  (successful with warnings)

WARNING: Tracing small, frequently called functions can add excessive overhead.

WARNING: To set a minimum size, say 800 bytes, for traced functions, use:

        -D trace-text-size=800.

WARNING: A total of 9 user-defined functions were traced.

vi runit (successful)  -change VEN008 to TRN001

change mppwidth=64 to nodes=16, rm mppnppn

#!/bin/bash

#PBS -N himeno 

#PBS -l mppwidth=64 (change to-> -l nodes=16)

#PBS -l mppnppn= (Remove line)

#PBS -l mppdepth=4 (Remove line)

#PBS -l walltime=1:00:00

#PBS -j oe

#PBS -A VEN008 (change to TRN001)

cd $PBS_O_WORKDIR

date

aprun -n 64  -N 4 ./himeno_orig:

qsub runit (successful) 159553

add -hprofile_generate to compile

        - Pat_report-T........xf>profile

        - Pat_report-Oca......xf>profile_ca

        - Pat_report-Oct......xf>profile_ft

        

Tue Jul 16 12:56:30 EDT 2013

 Sequential version array size

  mimax= 1025  mjmax= 1025  mkmax= 1025

 Parallel version  array size

  mimax= 259  mjmax= 259  mkmax= 259

  imax= 257  jmax= 257  kmax= 257

  I-decomp=  4  J-decomp=  4  K-decomp=  4

 

  Start rehearsal measurement process.

  Measure the performance in 3 times.

   MFLOPS: 78872.31760109021   time(s): 1.3804740905761719,  1.147200164E-4

 Now, start the actual measurement process.

 The loop will be excuted in 130  times.

 This will take about one minute.

 Wait for a while.

  Loop executed for  130  times

  Gosa : 1.128552685E-4

  MFLOPS: 78939.087665233732   time(s): 59.76994514465332

  Score based on Pentium III 600MHz : 952.910339

 STOP  

|

 STOP  

Application 499391 resources: utime ~3982s, stime ~80s, Rss ~959816, inblocks ~158075, outblocks ~397838        

        

(Day 2) Reveal notes - questions

Slides will be posted online after the workshop.  Some slides (John and Jeffs) are already up at the repo.

https://www.olcf.ornl.gov/wp-content/uploads/2013/02/Cray_Reveal-HP1.pdf

older version, but same info

I found that Reveal is not on Titan itself. Can that be made available so we who are off-site can try out too?

ANSWER: you have to add module perftools (module add perftools).  Reveal is a binary inside that module.

not related to reveal, but I was wondering does openacc loop direcitve support arrays as private, or make reduction operations on arrays ?

Reveal is on titan, you have to load perftools to get it.

You cannot  use Reveal on non-Cray platforms; however, the code you generate can be moved to other platforms

OpenACC does have reduction as well, what it does not have is an equivalent to !$omp critical; !$omp end critical and !$omp threadprivate.

 I will talk about these issues tomorrow morning.

(QUESTION) not related to reveal, but I was wondering does openacc loop direcitve support arrays as private, or make reduction operations on arrays ?

(ANSWER)

Arrays can be private; declare them as such (as an aside, scalar variables are automatically private in OpenACC, so you don’t have to declare them unless you wish to. This goes further than OpenMP, where only loop variables are automatically private.)

Reduction operations are not supported on arrays (or array elements) in OpenACC v1. I’ll need to check if this changes in v2. So you may need to either refactor your code (perhaps using a loop-private temporary scalar) or arrange the loop schedule so that the reduction operation is over non-partitioned loops.

perhaps saving some results and using the sum(:) intrinsic may help, just a suggestion

Running Reveal

username@chester-login1:/lustre/scratch/username/himeno_step1> reveal &

[1] 30073

username@chester-login1:/lustre/scratch/username/himeno_step1> 

(reveal:30073): Gtk-WARNING **: cannot open display: localhost:0.0

to solve reveal issue ensure use the "-X" switch

ssh -X username@hostname

On Mac install XQuartz http://xquartz.macosforge.org/landing/

For windows users, Xming is alright for X11 forwarding as well: http://sourceforge.net/projects/xming/

(Day 3) OpenACC Examples

http://openacc.org/

http://www.openacc.org/sites/default/files/OpenACC_API_QuickRefGuide.pdf

http://openmp.org/wp/

https://computing.llnl.gov/tutorials/openMP/#ParallelRegion

OpenACC 2.0 Specification

http://www.openacc-standard.org/node/297

envvar CRAY_ACC_DEBUG=(1,2,3) to get more verbosity from the Cray compiler

Here are the examples I’ll be using in my interoperability talk: https://github.com/jefflarkin/openacc-interoperability

NVidia Visual Profiler

Jeff Larkin’s trick to get one consolidated file:

Please see nvprof.sh, profile.sh, and profile_csv.sh in the github repository, depending on your needs. You can simply add the appropriate script as follows:

aprun ./nvprof.sh ./a.out

If you don’t want a profile from every single process, you can do the following:

aprun -n 1023 -N 1 ./a.out : -n 1 -N 1 ./nvprof.sh ./a.out

Vampir Performance Optimization 

http://www.vampir.eu/