linux 内存带宽读写性能测试工具 stream 5.10
资源内容介绍
linux 内存带宽读写性能测试工具 stream 5.10 -------------------------------------------------------------------------Revisions as of Thu, Jan 17, 2013 3:50:01 PMVersion 5.10 of stream.c has been released.This version includes improved validation code and will automaticallyuse 64-bit array indices on 64-bit systems to allow very large arrays.-------------------------------------------------------------------------Revisions as of Thu Feb 19 08:16:57 CST 2009Note that the codes in the "Versions" subdirectory should beconsidered obsolete -- the versions of stream.c and stream.fin this main directory include the OpenMP directives and structurefor creating "TUNED" versions. Only the MPI version in the "Versions" subdirectory should beof any interest, and I have not recently checked that version forerrors or compliance with the current versions of stream.c andstream.f.I added a simple Makefile to this directory. It works under Cygwinon my Windows XP box (using gcc and g77).A user suggested a sneaky trick for "mysecond.c" -- instead of usingthe #ifdef UNDERSCORE to generate the function name that the Fortrancompiler expects, the new version simply defines both "mysecond()"and "mysecond_()", so it should automagically link with most Fortrancompilers.-------------------------------------------------------------------------Revisions as of Wed Nov 17 09:15:37 CST 2004The most recent "official" versions have been renamed "stream.f" and"stream.c" -- all other versions have been moved to the "Versions"subdirectory.The "official" timer (was "second_wall.c") has been renamed "mysecond.c".This is embedded in the C version ("stream.c"), but still needs to beexternally linked to the FORTRAN version ("stream.f").-------------------------------------------------------------------------Revisions as of Tue May 27 11:51:23 CDT 2003Copyright and License info added to stream_d.f, stream_mpi.f, andstream_tuned.f-------------------------------------------------------------------------Revisions as of Tue Apr 8 10:26:48 CDT 2003I changed the name of the timer interface from "second" to "mysecond"and removed the dummy argument in all versions of the source code (butnot the "Contrib" versions).-------------------------------------------------------------------------Revisions as of Mon Feb 25 06:48:14 CST 2002Added an OpenMP version of stream_d.c, called stream_d_omp.c. This isstill not up to date with the Fortran version, which includes errorchecking and advanced data flow to prevent overoptimization, but it isa good start....-------------------------------------------------------------------------Revisions as of Tue Jun 4 16:31:31 EDT 1996I have fixed an "off-by-one" error in the RMS time calculation instream_d.f. This was already corrected in stream_d.c. No results areinvalidated, since I use minimum time instead of RMS time anyway....-------------------------------------------------------------------------Revisions as of Fri Dec 8 14:49:56 EST 1995I have renamed the timer routines to:second_cpu.csecond_wall.csecond_cpu.fAll have a function interface named 'second' which returns a doubleprecision floating point number. It should be possible to linksecond_wall.c with stream_d.f without too much trouble, though thedetails will depend on your environment.If anyone builds versions of these timers for machines running theMacintosh O/S or DOS/Windows, I would appreciate getting a copy.To clarify: * For single-user machines, the wallclock timer is preferred. * For parallel machines, the wallclock timer is required. * For time-shared systems, the cpu timer is more reliable, though less accurate. -------------------------------------------------------------------------Revisions as of Wed Oct 25 09:40:32 EDT 1995(1) NOTICE to C users: stream_d.c has been updated to version 4.0 (beta), and should be functionally identical to stream_d.f Two timers are provided --- second_cpu.c and second_wall.c second_cpu.c measures cpu time, while second_wall.c measures elapsed (real) time. For single-user machines, the wallclock timer is preferred. For parallel machines, the wallclock timer is required. For time-shared systems, the cpu timer is more reliable, though less accurate. (2) cstream.c has been removed -- use stream_d.c(3) stream_wall.f has been removed --- to do parallel aggregate bandwidth runs, comment out the definition of FUNCTION SECOND in stream_d.f and compile/link with second_wall.c(4) stream_offset has been deprecated. It is still here and usable, but stream_d.f is the "standard" version. There are easy hooks in stream_d.f to change the array offsets if you want to.(5) The rules of the game are clarified as follows: The reference case uses array sizes of 2,000,000 elements and no additional offsets. I would like to see results for this case. But, you are free to use any array size and any offset you want, provided that the arrays are each bigger than the last-level of cache. The output will show me what parameters you chose. I expect that I will report just the best number, but if there is a serious discrepancy between the reference case and the "best" case, I reserve the right to report both. Of course, I also reserve the right to reject any results that I do not trust....--John D. McCalpin, Ph.D. john@mccalpin.com