# ANALYZER

**Intel(R) VTune(TM) Fabric Profiler  
VERSION 2.0.0  
ONLY FOR EVALUATION USE**

0) Introduction
1) Analyzer Requirements (MATLAB Runtime Install)
2) Installation
3) Obtaining Fabric Profiler trace files
4) Trace File Contents
5) Analyzers Available
6) Command Line Options (args)
7) Analyzer On Startup (GUI)
8) Analyzers (GUI)
9) Analysis Report (r)
10) Important Analyzer Tips

**************************************************************

0) __Introduction__

    There are 5 analyzers which read the trace files produced by the Fabric Profiler
    data collector at instrumented-application run time. The Linux binary that includes
    the 5 analyzers is located in the release package under bin/analyzer and is launched
    with the "fpro_analyzer" binary.

1) __Analyzer Requirements__

    * [MATLAB Runtime R2021b (9.11)](https://www.mathworks.com/products/compiler/matlab-runtime.html) for Linux.

    * Run your SHMEM application with Fabric Profiler's Collector to generate required trace files.

    * Four Trace Files:

          {trace-file-prefix}.uc1.put
          {trace-file-prefix}.uc1.profile
          {trace-file-prefix}.uc1.hfi
          {trace-file-prefix}.uc1.func

      Traces are found under the $ESP_TRACE_PATH directory.

      When opening a trace in the analyzer(s) the directory must contain ALL FOUR traces.
      Feel free to move this directory to a more accessible location.

2) __Installation__

    Add the location of the "fpro_analyzer" binary to your PATH variable:

        export PATH=$PATH:$ESP_ROOT/bin/analyzer

3) __Obtaining Fabric Profiler trace files__

    Trace files are generated when you run an application with Fabric Profiler instrumentation. You can also find sets of trace files in the release package under examples/sample_traces.

    Run Fabric Profiler Examples:

        There are example SHMEM applications in the release package.
        Each has a Makefile to build the application.

        Run the SHMEM application with Fabric Profiler's Collector by using the "fpro.sh" script.
        > See doc/README.OVERVIEW for "fpro.sh" usage

        Output contains trace files that the analyzer will use (.func, .hfi, .put, .profile).
        Traces are located under $ESP_TRACE_PATH/

4) __Trace File Contents__

    While the Fabric Profiler-instrumented application is running, the data collector
    library linked with the application is gathering data about application and network
    behavior. When the application calls **ishmem_finalize** or **shmem_finalize**
    (depending on the SHMEM version used), the data collector
    writes trace files containing this data. Four trace files are binary files, and one is a text
    file.

      Trace Files:

      a. {trace-file-prefix}.uc1.func
        
        The function trace file contains information about every profiled SHMEM and
        MPI function call. Each process writes out a separate function trace file.
        These separate traces will be auto-merged by the fpro.sh script.

      b. {trace-file-prefix}.uc1.hfi
      
        We monitor send and receive counters on the host fabric interface card while
        the application is running. The HFI file contains those time-stamped counter
        values.

      c. {trace-file-prefix}.uc1.profile
        
        We monitor system performance counters and gather other system information
        while the application is running. That information is written to the profile
        file. Each process writes out a separate profile trace file. These separate traces
        will be auto-merged by the fpro.sh script.

      d. {trace-file-prefix}.uc1.put
      
        We monitor the amount of data being injected into the network with each
        shmem_put call and the destination node for each put. The put file contains
        these values. (Note that at the present time the data collector does not monitor
        data movement of shmem_get calls, atomics, reduction operations, or collectives.)
        Each process writes out a separate put trace file. These separate traces will be
        auto-merged by the fpro.sh script.

      e. {trace-file-prefix}.uc1.ev.txt

        The only trace file which is a text file rather than a binary file, the environment
        file is a list of all environment variables defined at application run-time.

    > NOTE: The collector outputs unmerged func, profile, and put trace files.
      However, the "fpro.sh" script will *auto-merge* these traces into individual
      traces (.func, .profile, .put). These merged trace files are required by the analyzers.
    
    > **If** the *auto-merge* fails, the **mergeFuncFile**, **mergeProfileFile**, and **mergePutFile**
      executables are located under $ESP_ROOT/bin/collector. Follow their usage instructions to create
      the merged trace files.

        ------- Specifically, Cray-Aries Fabric -------
        We have begun porting trace files from our home-grown format to OTF2. You may
        see a directory with OTF2 files as well. Eventually, all of the trace data will
        be contained in OTF2 files.

        When the instrumented application is run on a Cray Aries network, there are
        two additional trace files.  Their use is experimental at this point and the
        currently released analyzers will not use them.

          f. {trace-file-prefix}.uc1.topo contains the network topology of the
          parallel application.

          g. {trace-file-prefix}.uc1.hfs contains time-stamped congestion data read
          from the Cray Aries network switches directly connected to the blades on
          which the application resided.

        When the instrumented application is built with an OpenSHMEM implementation that
        supports the SHMEMX Pcntr interface (currently only SOS 1.4), another trace file
        will be written.  Again, use of the pcntr data is experimental and the currently
        released analyzers will not use this file.

          h. {trace-file-prefix}.uc1.pcntr contains internal performance counters maintained
          by the OpenSHMEM implmementation.
        ------- Specifically, Cray-Aries Fabric -------

    The "trace-file-prefix" is set to your application name.
   
    Example:

       User App: sanity.c
       Traces:
           sanity.uc1.func
           sanity.uc1.profile
           sanity.uc1.put
           sanity.uc1.hfi
           sanity.uc1.ev.txt

5) __Analyzers Available__

   Fabric Backlog Analyzer (fbla) is the default analyzer view.

        > Fabric Backlog Analyzer (fbla)
        > Barrier Analyzer (ba)
        > Latency Analyzer (la)
        > Message Straggler Analyzer (msa)
        > Analysis Report (r) [New Feature - No GUI Associated]

6) __Command Line Options (args)__

        Intel(®) VTune(™) Fabric Profiler, v2.0.0
        usage:

        fpro_analyzer --help | -help - display usage
        fpro_analyzer --version | -v | -V - display version and build information
        fpro_analyzer - start fabric profiler backlog analyzer
        fpro_analyzer {ba|fbla|la|msa|r} - start with ba, fbla, la, msa, or r
        fpro_analyzer <trace file> - open fbla with <trace file>
        fpro_analyzer {ba|fbla|la|msa|r} <trace file> - open selected analyzer with <trace file>
        fpro_analyzer {ba|fbla|la|msa|r} <fabric select> <trace file> - open selected analyzer, select fabric, with <trace file>

        Additional Notes:
        <trace file> Provide full path to trace OR simply the directory path containing the traces
          Ex: fpro_analyzer fbla /path/to/traces/
          Ex: fpro_analyzer fbla /path/to/traces/my_trace.uc1.put
        <fabric select> Select the fabric used {Cray-Slingshot11, Cray-Aries} OR {1, 2} respectively
          Ex: fpro_analyzer fbla Cray-Slingshot11 /path/to/traces/my_trace.uc1.put
          Ex: fpro_analyzer fbla 2 /path/to/traces/my_trace.uc1.put

7) __Analyzer On Startup (GUI)__

    The analyzer, when started, will display a splash screen for a few moments and a fabric select popup menu.

        Fabric(s) supported by Analyzer:
          > Cray-Slingshot11
          > Cray-Aries
    
    Once the fabric is selected a window with "File", "View", and "Help" menus in the upper left corner will appear.

    Use the "View" menu to choose one of four analyzers.  Use the "File" menu to choose
    the trace file to be visualized.  Then wait for the analyzer to process and
    display results.

        GUI Outline:
          > File:   # Select Trace File
          > View:   # Change Analyzer
          > Help:   # Display Splash Screen

8) __Analyzers (GUI)__

    There are four Analyzers that contain GUI controls.

    * fbla (Default View) - The Fabric Backlog Analyzer reads the put trace file and
      correlates that with the hfi trace file to visualize fabric backlog at any
      point in time. Key features available:

          i. If the application defined Fabric Profiler code regions,
                select "Show Region Bounds" and choose regions of interest.
                The temporal regions will be highlighted on the graph of network
                backlog as a function of time.
          
          ii.  Select an individual node to display backlog for that node only.
          
          iii. View injection and or ejection backlog (requested less actual):
                o injection requested, data sent off-node by this node in the application
                o injection actual, data sent into network by the HFI
                o ejection requested, data sent by other nodes in application to this node
                o ejection actual, data received from network according to HFI
          
          iv. Experiment with zooming and panning to bring into focus areas of interest.
          
          v.  Try offset adjustment modes.
          
          vi. Toggle between volume and rate displays.

          vii. Try the data cursors. The 4th widget in the row below the "File" and
                "Help menus"; it looks like a plus sign and a text window on a curve.
                Click on the widget and then anywhere on the plot to get data values
                for that point.

    * ba - The Barrier Trace Analyzer reads the function trace file and
        displays barrier wait times for each barrier call in the source code for
        each PE. Key features available:

          i. Five different measures
                > PE wait time
                > PE arrival volume
                > node wait density
                > PE percent late
                > PE outlier late.

          ii.  Vary the thresholding control.

          iii. Narrow results to a specific lexical occurence
                (a particular source code line containing a barrier).

    *  la - The Function (latency) Trace Analyzer reads the function trace
        file and displays function latency for all instrumented SHMEM and MPI calls. Be
        prepared to wait for the results to appear in the case of trace files
        containing hundreds of thousands of function calls. The default display
        shows composite PE wait time for all calls at each point in time.
        Key features available:

            i. Select individual function calls to display latency hot spots for that call.

            ii. Click on "View Regions" if the application defined Fabric Profiler regions.
                Choose regions to highlight temporal spans on the graph which represent
                those regions of code.

            iii. Switch to the communications matrix, which visualizes the volume of
                data sent from each PE to every other PE.

            iv. Try the zoom, pan and data cursor widgets under the "File" and "Help"
                menus to drill down into the data represented by the display.

            v. Experiment with the thresholding controls for frequency, high value
              and low value.

    *  msa - The Message Straggler Analyzer reads the function trace file and
        correlates the activity there with the network activity in the HFI trace
        file.

            i. Do not put 'barrier' calls inside of a loop. This will slow down analysis.


9) __Analysis Report (r)__

    This is a NEW analyzer type that generates an HTML report.
    This .html report file contains all of the same graphs found in the analyzers.

      1) Run:

              ./fpro_analyzer r

      2) Select Fabric

      3) Select Trace - File Explorer will open
      
      4) Waitbar appears

              > Note the waitbar appears to hang on "Writing HTML Report" (this is expected).
              > Please allow the Analysis Report Analyzer to run for several minutes.
              > You may notice the graphs being generated.
      
      5) Once complete the Analyzer prints out the directory of the written HTML file.
      
      6) Navigate to that file and open it in a browser.

    Other Run Options:

        ./fpro_analyzer r <trace file>
        ./fpro_analyzer r <fabric select> <trace file>

    Realize that the .html report generation will take some time since it is generating the complex graphs found in each analyzer so do not close the application if it appears to hang for a few minutes.

    If you want to stop report generation you must kill the process. Simply exiting the waitbar will not kill the program.

10) __Important Analyzer Tips__

    Here are few things to keep in mind:
    
    * If you run "fpro_analyzer" and select the wrong fabric - Close and reopen the software.
    * Some of the Analyzers may have issues with really small workloads.
    * At least 2 Nodes must be used. (i.e. there MUST be HFI traffic)
    * Every **Node** must have at least **one PE** that sends a **put** operation. (i.e. there MUST be communication between nodes)
    * HFI counters have a 1 second resolution (i.e. the counters are updated every 1 second). This is not controlled by the Profiler. See README.OVERVIEW.md.
    * Region markers must have the same entry & exit names.
    * Do NOT call a shmem_barrier_all function inside of a loop.
    * Message Straggler Analyzer assumes at least 2 instances of shmem_barrier_all be used.
    * Every PE must call the same shmem_barrier_all instance to synchronize.  
      Do NOT conditionally call shmem_barrier_all:
      
       	    if (pe == 0)
    		    shmem_barrier_all()
