optimization alternatives

for when you really need to waste time

or, just make the application that much faster

here are a few of the usual optimization paths i've run into while in my limited career, both from reading and personal accomplishment.

i'm also an idiot and i don't like the selective, degrading and segregational language some professional programmers use, so i explain things in the lowest level possible, rather than try to confuse you with acronyms and cutesy sayings, where possible. i don't like ivory tower flowery language. save that for poetry and real feelings. don't try to confuse things.

with that said, i also don't like people who think they can't be challenged. you can always be challenged. and there's always somebody better. i'll challenge anything, anywhere, if i have cause or just out of spite if you can't prove yourself. i don't take any god-like deity figure types seriously, because they're more often than not trying to control someone else for the heck of it, or are just bluffing. you should always be prepared to explain yourself when challenged, and i always will, to the best of my ability.

if you beat me, then you should always have a reason why, should explain it fully, and always have a way for me to accomplish myself to your proven ability.

this last point is a real sticking issue... because i lost once... in the most bullshit way possible. because of that, i always lose. but i accept that.

regardless, these are my sayings that i'll stick to.

remember, the fundamental rules of optimization are:

don't do it
measure first, complexity should be traded off for simplicity, almost always.
isolate it and modularize it's code to fall back on a simple implementation when it has bugs (it always will, because you're not that smart.)
fewer lines does not mean faster, especially in today's kooky languages

the most golden rule: avoid more work, do less work for the same result, in the same domain and range.

the most common techniques:

modularity

isolate code as much as possible. present a limited view of the internal data structure to the callers of your algorithms/data structures. provide only just enough for the caller to hurt themselves, and never enough to corrupt your structure, unless you absolutely need to.

design pieces to gracefully fail.

design pieces to operate only on basic things, never on specific bits, unless they need to. they only usually need to in the top-level structure of an application.

copy-on-write

instead of copying a huge data structure entirely (a "deep" copy), copy only a pointer to the original structure. simple example:

 struct cow 
{ 
	void *orig;

	struct delta
	{
		size_t len, offset;
		struct delta *next;
	} *deltas;
}

then, if the user can write to it, allocate memory for the new piece (the "delta") and add a pointer to that in your copied pointer.

hash tables

mostly used to cache keys, it can easily devolve into linked-list semantics if overused. use a tree in those situations.

lists are universally used, mostly.

caching

it is almost always faster to access a static value than recalculate a value multiple times.

the "almost" part comes from a poor choice in cache data structure. most of the time, choosing a hash table as the cache structure is the best option.

try to reduce your calculations to only require the bare minimum of what needs to be calculated.

try to pre-calculate as much as possible when you're doing a period of high latency/downtime anyway.

try to incrementially calculate.

multi-threading

the common accepted trade-off is to increase view layer complexity to reduce the impact of time-consuming operations done in the application (e.g., database lookups, layouts)

it's also advantagous to have multiple threads to handle rendering and input, for low-latency applications.

the problem here is that the application's purpose and interface must gracefully handle periods of use where no data is available, otherwise it will seem stalled anyway.

faking it

if it looks right, it is right, to the end-user's eyes. the trick is that you have to make sure it is right when the right time comes. that's all application dependent.

i think the most interesting example is how apple did the scrolling on the idevices, where they simply scroll a raster version of the previous screen, instead of doing multiple scrolls at once.

bulk-copies

all processors perfer to copy things in bulk, never one-at-a-time, because the memory busses are simply wider than one byte.

the same thing applies for incremental processing. processors have these things called "level" caches, which is instruction and data cache at an ultra-low level. a "cache miss" is when this low-level cache doesn't have the required data or is out-dated, and needs to go into RAM to get the data. this is slow (relatively). avoid this by keeping like elements near each other and processing things sequentially, when possible.

i'll admit, it's hard to do that when it's easier to use a hashtable lookup, but... the difference is astounding, in some applications.

design

keep like stuff near each other. i don't like the hard and fast rule of "only one class per file" or any of that organizational nonsense. like classes should be in the same file if they share the same overall concept.

reading and writing

don't read more than you'll need. don't read while you're trying to write. don't read while others are trying to write!

this kills the runtime in things like queue-in-a-database-table implementations. sorts and writes are the complete antithesis to a queue structure!

processing data structures

optimize for sequential reads

writes should be done in a separate thread than the reads

don't reorganize the input for any reason!

plan for duplicates and handle them in the reading stage, not in the writing stage (i.e., don't avoid writing duplicates, handle it later!) could probably use a thread for handling dupes by batching them.

use all the information you have available to reduce the input array if possible for the next stage. (e.g., in one program i wrote, this option processor, the original code ran through the same set of positions, first to validate them, then to process them and get more information on valid ones. i changed it to filter in the next step and the information getter was changed to be a batch database call.)

try to only gather and group once.

cool stuff

passing yourself to an "inversion of control" / fancy, structured, introspective malloc / allocator is a cool idea, for the reason below:

passing yourself to a data structure and operating the data structure on yourself is a nice way to run code whose internal workings are unknown, but the results are useful.