🗊Презентация Common C++ Performance Mistakes in Games

Нажмите для полного просмотра!
Common C++ Performance Mistakes in Games, слайд №1Common C++ Performance Mistakes in Games, слайд №2Common C++ Performance Mistakes in Games, слайд №3Common C++ Performance Mistakes in Games, слайд №4Common C++ Performance Mistakes in Games, слайд №5Common C++ Performance Mistakes in Games, слайд №6Common C++ Performance Mistakes in Games, слайд №7Common C++ Performance Mistakes in Games, слайд №8Common C++ Performance Mistakes in Games, слайд №9Common C++ Performance Mistakes in Games, слайд №10Common C++ Performance Mistakes in Games, слайд №11Common C++ Performance Mistakes in Games, слайд №12Common C++ Performance Mistakes in Games, слайд №13Common C++ Performance Mistakes in Games, слайд №14Common C++ Performance Mistakes in Games, слайд №15Common C++ Performance Mistakes in Games, слайд №16Common C++ Performance Mistakes in Games, слайд №17Common C++ Performance Mistakes in Games, слайд №18Common C++ Performance Mistakes in Games, слайд №19Common C++ Performance Mistakes in Games, слайд №20Common C++ Performance Mistakes in Games, слайд №21Common C++ Performance Mistakes in Games, слайд №22Common C++ Performance Mistakes in Games, слайд №23Common C++ Performance Mistakes in Games, слайд №24Common C++ Performance Mistakes in Games, слайд №25Common C++ Performance Mistakes in Games, слайд №26Common C++ Performance Mistakes in Games, слайд №27Common C++ Performance Mistakes in Games, слайд №28Common C++ Performance Mistakes in Games, слайд №29Common C++ Performance Mistakes in Games, слайд №30Common C++ Performance Mistakes in Games, слайд №31Common C++ Performance Mistakes in Games, слайд №32Common C++ Performance Mistakes in Games, слайд №33Common C++ Performance Mistakes in Games, слайд №34Common C++ Performance Mistakes in Games, слайд №35Common C++ Performance Mistakes in Games, слайд №36Common C++ Performance Mistakes in Games, слайд №37Common C++ Performance Mistakes in Games, слайд №38Common C++ Performance Mistakes in Games, слайд №39

Содержание

Вы можете ознакомиться и скачать презентацию на тему Common C++ Performance Mistakes in Games. Доклад-сообщение содержит 39 слайдов. Презентации для любого класса можно скачать бесплатно. Если материал и наш сайт презентаций Mypresentation Вам понравились – поделитесь им с друзьями с помощью социальных кнопок и добавьте в закладки в своем браузере.

Слайды и текст этой презентации


Слайд 1





Common C++ Performance Mistakes in Games
Pete Isensee
Xbox Advanced Technology Group
Описание слайда:
Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group

Слайд 2





About the Data
ATG reviews code to find bottlenecks and make perf recommendations
50 titles per year
96% use C++
1 in 3 use “advanced” features like templates or generics
Описание слайда:
About the Data ATG reviews code to find bottlenecks and make perf recommendations 50 titles per year 96% use C++ 1 in 3 use “advanced” features like templates or generics

Слайд 3





Why This Talk Is Important
The majority of Xbox games are CPU bound
The CPU bottleneck is often a language or C++ library issue
These issues are not usually specific to the platform
Описание слайда:
Why This Talk Is Important The majority of Xbox games are CPU bound The CPU bottleneck is often a language or C++ library issue These issues are not usually specific to the platform

Слайд 4





Format
Definition of the problem
Examples
Recommendation
For reference
A frame is 17 or 33 ms (60fps / 30fps)
Bottlenecks given in ms per frame
Описание слайда:
Format Definition of the problem Examples Recommendation For reference A frame is 17 or 33 ms (60fps / 30fps) Bottlenecks given in ms per frame

Слайд 5





Issue: STL
Game using std::list
Adding ~20,000 objects every frame
Rebuilding the list every frame
Time spent: 6.5 ms/frame!
~156K overhead (2 pointers per node)
Objects spread all over the heap
Описание слайда:
Issue: STL Game using std::list Adding ~20,000 objects every frame Rebuilding the list every frame Time spent: 6.5 ms/frame! ~156K overhead (2 pointers per node) Objects spread all over the heap

Слайд 6





std::set and map
Many games use set/map as sorted lists
Inserts are slow (log(N))
Memory overhead: 3 ptrs + color
Worst case in game: 3.8 ms/frame
Описание слайда:
std::set and map Many games use set/map as sorted lists Inserts are slow (log(N)) Memory overhead: 3 ptrs + color Worst case in game: 3.8 ms/frame

Слайд 7





std::vector
Hundreds of push_back()s per frame
VS7.1 expands vector by 50%
Question: How many reallocations for 100 push_back()s?
Answer: 13!  (1,2,3,4,5,7,10,14,20,29,43,64,95)
Описание слайда:
std::vector Hundreds of push_back()s per frame VS7.1 expands vector by 50% Question: How many reallocations for 100 push_back()s? Answer: 13! (1,2,3,4,5,7,10,14,20,29,43,64,95)

Слайд 8





Clearly, the STL is Evil
Описание слайда:
Clearly, the STL is Evil

Слайд 9





Use the Right Tool for the Job
The STL is powerful, but it’s not free
Filling any container is expensive
Be aware of container overhead
Be aware of heap fragmentation and cache coherency
Prefer vector, vector::reserve()
Описание слайда:
Use the Right Tool for the Job The STL is powerful, but it’s not free Filling any container is expensive Be aware of container overhead Be aware of heap fragmentation and cache coherency Prefer vector, vector::reserve()

Слайд 10





The STL is Evil, Sometimes
The STL doesn’t solve every problem
The STL solves some problems poorly
Sometimes good old C-arrays are the perfect container
Mike Abrash puts it well:
“The best optimizer is between your ears”
Описание слайда:
The STL is Evil, Sometimes The STL doesn’t solve every problem The STL solves some problems poorly Sometimes good old C-arrays are the perfect container Mike Abrash puts it well: “The best optimizer is between your ears”

Слайд 11





Issue: NIH Syndrome
Example: Custom binary tree
Sorted list of transparent objects
Badly unbalanced
1 ms/frame to add only 400 items
Example: Custom dynamic array class
Poorer performance than std::vector
Fewer features
Описание слайда:
Issue: NIH Syndrome Example: Custom binary tree Sorted list of transparent objects Badly unbalanced 1 ms/frame to add only 400 items Example: Custom dynamic array class Poorer performance than std::vector Fewer features

Слайд 12





Optimizations that Aren’t
void appMemcpy( void* d, const void* s, size_t b )
{
   // lots of assembly code here ...
}
appMemcpy( pDest, pSrc, 100 ); // bottleneck

appMemcpy was slower than memcpy for anything under 64K
Описание слайда:
Optimizations that Aren’t void appMemcpy( void* d, const void* s, size_t b ) { // lots of assembly code here ... } appMemcpy( pDest, pSrc, 100 ); // bottleneck appMemcpy was slower than memcpy for anything under 64K

Слайд 13





Invent Only What You Need
std::set/map more efficient than the custom tree by 10X
Tested and proven
Still high overhead
An even better solution
Unsorted vector or array
Sort once
20X improvement
Описание слайда:
Invent Only What You Need std::set/map more efficient than the custom tree by 10X Tested and proven Still high overhead An even better solution Unsorted vector or array Sort once 20X improvement

Слайд 14





Profile
Run your profiler
Rinse. Repeat.
Prove the improvement.
Don’t rewrite the C runtime or STL just because you can. There are more interesting places to spend your time.
Описание слайда:
Profile Run your profiler Rinse. Repeat. Prove the improvement. Don’t rewrite the C runtime or STL just because you can. There are more interesting places to spend your time.

Слайд 15





Issue: Tool Knowledge
If you’re a programmer, you use C/C++ every day
C++ is complex
CRT and STL libraries are complex
The complexities matter
Sometimes they really matter
Описание слайда:
Issue: Tool Knowledge If you’re a programmer, you use C/C++ every day C++ is complex CRT and STL libraries are complex The complexities matter Sometimes they really matter

Слайд 16





vector::clear
Game reused global vector in frame loop
clear() called every frame to empty the vector
C++ Standard
clear() erases all elements (size() goes to 0)
No mention of what happens to vector capacity
On VS7.1/Dinkumware, frees the memory
Every frame reallocated memory
Описание слайда:
vector::clear Game reused global vector in frame loop clear() called every frame to empty the vector C++ Standard clear() erases all elements (size() goes to 0) No mention of what happens to vector capacity On VS7.1/Dinkumware, frees the memory Every frame reallocated memory

Слайд 17





Zero-Initialization
struct Array { int x[1000]; };
struct Container {
   Array arr;
   Container() : arr() { }
};
Container x; // bottleneck

Costing 3.5 ms/frame
Removing : arr() speeds this by 20X
Описание слайда:
Zero-Initialization struct Array { int x[1000]; }; struct Container { Array arr; Container() : arr() { } }; Container x; // bottleneck Costing 3.5 ms/frame Removing : arr() speeds this by 20X

Слайд 18





Know Thine Holy Standard
Use resize(0) to reduce container size without affecting capacity
T() means zero-initialize PODs. Don’t use T() unless you mean it.
Get a copy of the C++ Standard. Really.
www.techstreet.com; search on 14882
Only $18 for the PDF
Описание слайда:
Know Thine Holy Standard Use resize(0) to reduce container size without affecting capacity T() means zero-initialize PODs. Don’t use T() unless you mean it. Get a copy of the C++ Standard. Really. www.techstreet.com; search on 14882 Only $18 for the PDF

Слайд 19





Issue: C Runtime
void BuildScore( char* s, int n )
{
   if( n > 0 )
      sprintf( s, “%d”, n );
   else
      sprintf( s, “” );
}
n was often zero
sprintf was a hotspot
Описание слайда:
Issue: C Runtime void BuildScore( char* s, int n ) { if( n > 0 ) sprintf( s, “%d”, n ); else sprintf( s, “” ); } n was often zero sprintf was a hotspot

Слайд 20





qsort
Sorting is important in games
qsort is not an ideal sorting function
No type safety
Comparison function call overhead
No opportunity for compiler inlining
There are faster options
Описание слайда:
qsort Sorting is important in games qsort is not an ideal sorting function No type safety Comparison function call overhead No opportunity for compiler inlining There are faster options

Слайд 21





Clearly, the CRT is Evil
Описание слайда:
Clearly, the CRT is Evil

Слайд 22





Understand Your Options
itoa() can replace sprintf( s, “%d”, n ) 
*s = ‘\0’ can replace sprintf( s, “” )
std::sort can replace qsort
Type safe
Comparison can be inlined
Other sorting options can be even faster: partial_sort, partition
Описание слайда:
Understand Your Options itoa() can replace sprintf( s, “%d”, n ) *s = ‘\0’ can replace sprintf( s, “” ) std::sort can replace qsort Type safe Comparison can be inlined Other sorting options can be even faster: partial_sort, partition

Слайд 23





Issue: Function Calls
50,000-100,000 calls/frame is normal
At 60Hz, Xbox has 12.2M cycles/frame
Function call/return averages 20 cycles
A game calling 61,000 functions/frame spends 10% CPU (1.7 ms/frame) in function call overhead
Описание слайда:
Issue: Function Calls 50,000-100,000 calls/frame is normal At 60Hz, Xbox has 12.2M cycles/frame Function call/return averages 20 cycles A game calling 61,000 functions/frame spends 10% CPU (1.7 ms/frame) in function call overhead

Слайд 24





Extreme Function-ality
120,000 functions/frame
140,000 functions/frame
130,000 calls to a single function/frame (ColumnVec<3,float>::operator[])
And the winner:
340,000 calls per frame!
9 ms/frame of call overhead
Описание слайда:
Extreme Function-ality 120,000 functions/frame 140,000 functions/frame 130,000 calls to a single function/frame (ColumnVec<3,float>::operator[]) And the winner: 340,000 calls per frame! 9 ms/frame of call overhead

Слайд 25





Beware Elegance
Elegance → levels of indirection → more functions → perf impact
Use algorithmic solutions first
One pass through the world
Better object rejection
Do AI/physics/networking less often than once/frame
Описание слайда:
Beware Elegance Elegance → levels of indirection → more functions → perf impact Use algorithmic solutions first One pass through the world Better object rejection Do AI/physics/networking less often than once/frame

Слайд 26





Inline Judiciously
Remember: inline is a suggestion
Try “inline any suitable” compiler option
15 to 20 fps
68,000 calls down to 47,000
Try __forceinline or similar keyword
Adding to 5 funcs shaved 1.5 ms/frame
Don’t over-inline
Описание слайда:
Inline Judiciously Remember: inline is a suggestion Try “inline any suitable” compiler option 15 to 20 fps 68,000 calls down to 47,000 Try __forceinline or similar keyword Adding to 5 funcs shaved 1.5 ms/frame Don’t over-inline

Слайд 27





Issue: for loops
// Example 1: Copy indices to push buffer
for( DWORD i = 0; i < dwIndexCnt; ++i )
*pPushBuffer++ = arrIndices[ i ];

// Example 2: Initialize vector array
for( DWORD i = 0; i < dwMax; ++i )
   mVectorArr[i] = XGVECTOR4(0,0,0,0);

// Example 3: Process items in world
for( itr i = c.begin(); i < c.end(); ++i )
   Process( *i );
Описание слайда:
Issue: for loops // Example 1: Copy indices to push buffer for( DWORD i = 0; i < dwIndexCnt; ++i ) *pPushBuffer++ = arrIndices[ i ]; // Example 2: Initialize vector array for( DWORD i = 0; i < dwMax; ++i ) mVectorArr[i] = XGVECTOR4(0,0,0,0); // Example 3: Process items in world for( itr i = c.begin(); i < c.end(); ++i ) Process( *i );

Слайд 28





Watch Out For For
Never copy/clear a POD with a for loop
std::algorithms are optimized; use them

memcpy( pPushBuffer, arrIndices,
        dwIndexCnt * sizeof(DWORD) );

memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) );

for_each( c.begin(), c.end(), Process );
Описание слайда:
Watch Out For For Never copy/clear a POD with a for loop std::algorithms are optimized; use them memcpy( pPushBuffer, arrIndices, dwIndexCnt * sizeof(DWORD) ); memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) ); for_each( c.begin(), c.end(), Process );

Слайд 29





Issue: Exception Handling
Most games never throw
Most games never catch
Yet, most games enable EH
EH adds code to do stack unwinding
A little bit of overhead to a lot of code
10% size increase is common
2 ms/frame in worst case
Описание слайда:
Issue: Exception Handling Most games never throw Most games never catch Yet, most games enable EH EH adds code to do stack unwinding A little bit of overhead to a lot of code 10% size increase is common 2 ms/frame in worst case

Слайд 30





Disable Exception Handling
Don’t throw or catch exceptions
Turn off the C++ EH compiler option
For Dinkumware STL
Define “_HAS_EXCEPTIONS=0”
Write empty _Throw and _Raise_handler; see stdthrow.cpp and raisehan.cpp in crt folder
Add #pragma warning(disable: 4530)
Описание слайда:
Disable Exception Handling Don’t throw or catch exceptions Turn off the C++ EH compiler option For Dinkumware STL Define “_HAS_EXCEPTIONS=0” Write empty _Throw and _Raise_handler; see stdthrow.cpp and raisehan.cpp in crt folder Add #pragma warning(disable: 4530)

Слайд 31





Issue: Strings
Programmers love strings
Love hurts
~7000 calls to stricmp in frame loop
1.5 ms/frame
Binary search of a string table
2 ms/frame
Описание слайда:
Issue: Strings Programmers love strings Love hurts ~7000 calls to stricmp in frame loop 1.5 ms/frame Binary search of a string table 2 ms/frame

Слайд 32





Avoid strings
String comparisons don’t belong in the frame loop
Put strings in an table and compare indices
At least optimize the comparison
Compare pointers only
Prefer strcmp to stricmp
Описание слайда:
Avoid strings String comparisons don’t belong in the frame loop Put strings in an table and compare indices At least optimize the comparison Compare pointers only Prefer strcmp to stricmp

Слайд 33





Issue: Memory Allocation
Memory overhead
Xbox granularity/overhead is 16/16 bytes
Overhead alone is often 1+ MB
Too many allocations
Games commonly do thousands of allocations per frame
Cost: 1-5 ms/frame
Описание слайда:
Issue: Memory Allocation Memory overhead Xbox granularity/overhead is 16/16 bytes Overhead alone is often 1+ MB Too many allocations Games commonly do thousands of allocations per frame Cost: 1-5 ms/frame

Слайд 34





Hidden Allocations
push_back(), insert() and friends typically allocate memory
String constructors allocate
Init-style calls often allocate
Temporary objects, particularly string constants that convert to string objects
Описание слайда:
Hidden Allocations push_back(), insert() and friends typically allocate memory String constructors allocate Init-style calls often allocate Temporary objects, particularly string constants that convert to string objects

Слайд 35





Minimize Per-Frame Allocations
Use memory-friendly data structures, e.g. arrays, vectors
Reserve memory in advance
Use custom allocators
Pool same-size allocations in a single block of memory to avoid overhead
Use the explicit keyword to avoid hidden temporaries
Avoid strings
Описание слайда:
Minimize Per-Frame Allocations Use memory-friendly data structures, e.g. arrays, vectors Reserve memory in advance Use custom allocators Pool same-size allocations in a single block of memory to avoid overhead Use the explicit keyword to avoid hidden temporaries Avoid strings

Слайд 36





Other Tidbits
Compiler settings: experiment
dynamic_cast: just say no
Constructors: performance killers
Unused static array space: track this
Loop unrolling: huge wins, sometimes
Suspicious comments: watch out
“Immensely slow matrix multiplication”
Описание слайда:
Other Tidbits Compiler settings: experiment dynamic_cast: just say no Constructors: performance killers Unused static array space: track this Loop unrolling: huge wins, sometimes Suspicious comments: watch out “Immensely slow matrix multiplication”

Слайд 37





Wrap Up
Use the Right Tool for the Job
The STL is Evil, Sometimes
Invent Only What You Need
Profile
Know Thine Holy Standard
Understand Your Options
Описание слайда:
Wrap Up Use the Right Tool for the Job The STL is Evil, Sometimes Invent Only What You Need Profile Know Thine Holy Standard Understand Your Options

Слайд 38





Call to Action: Evolve!
Pass the rubber chicken
Share your C++ performance mistakes with your team
Mentor junior programmers
So they only make new mistakes
Don’t stop learning
You can never know enough C++
Описание слайда:
Call to Action: Evolve! Pass the rubber chicken Share your C++ performance mistakes with your team Mentor junior programmers So they only make new mistakes Don’t stop learning You can never know enough C++

Слайд 39





Questions
Fill out your feedback forms
Email: pkisensee@msn.com
This presentation: www.tantalon.com/pete.htm
Описание слайда:
Questions Fill out your feedback forms Email: pkisensee@msn.com This presentation: www.tantalon.com/pete.htm



Похожие презентации
Mypresentation.ru
Загрузить презентацию