<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Martin Wolf&#039;s weblog &#187; hacks</title>
	<atom:link href="http://mwolf.net/archive/category/hacks/feed/" rel="self" type="application/rss+xml" />
	<link>http://mwolf.net</link>
	<description>Software development and assorted geekery</description>
	<lastBuildDate>Sun, 08 Aug 2010 18:15:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>About chess and nuclear reactors: the case for exception handling</title>
		<link>http://mwolf.net/archive/exception-handling/</link>
		<comments>http://mwolf.net/archive/exception-handling/#comments</comments>
		<pubDate>Sun, 01 Aug 2010 10:02:48 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[design patterns]]></category>
		<category><![CDATA[error handling]]></category>
		<category><![CDATA[exception handling]]></category>
		<category><![CDATA[exceptions]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://mwolf.net/archive/exception-handling/</guid>
		<description><![CDATA[The world of software development has more than its fair share of topics where people tend to have long religious discussions about the &#8220;correct&#8221; way to do something. I think this is partly because the field for some reason attracts the kind of person who enjoys a nice bout of verbal fisticuffs, and partly because [...]]]></description>
			<content:encoded><![CDATA[<p>The world of software development has more than its fair share of topics where people tend to have long religious discussions about the &#8220;correct&#8221; way to do something. I think this is partly because the field for some reason attracts the kind of person who enjoys a nice bout of verbal fisticuffs, and partly because we spend a lot of time dealing with very abstract topics where the pros and cons of a given choice have more to do with differing philosophies than with objective facts.</p>
<p>One classic topic for this kind of discussion, which came up recently at work, is the use of exceptions for error handling. Every modern programming language offers an exception mechanism for this purpose, and presumably it is there to be used. However, ever since they were first introduced, there has been a large and vocal subset of the community arguing that exceptions do more harm than good and you&#8217;ll be writing better code if you just use good old return values to report whether a method succeeded.</p>
<p><a target="_blank" href="http://www.joelonsoftware.com/items/2003/10/13.html">One representative example</a> comes from Joel Spolsky, one of my favorite authors. Another oft-quoted article making the same arguments is found in the <a target="_blank" href="http://yosefk.com/c++fqa/exceptions.html#fqa-17.1">&#8220;Frequently Questioned Answers&#8221;</a> by Yossi Kreinin. They both make the same basic points: exceptions do not reduce complexity but merely hide it, and when complexity is hidden people tend to forget about it.</p>
<p>These arguments have merit, but I still feel that (when properly used) exception handling delivers enough value to be worth the cost. So I am going to be arguing for the status quo here, for a change. Executive summary: the dangers of exceptions are real, but code readability trumps almost everything.<br /><span id="more-99"></span><br />Let&#8217;s say that we have a method of, say, 15 lines of code. This method is straightforward and easy to read and to modify. I can understand at a glance what it does; it forms a clear picture in my mind which I can easily reason about. There is just one problem with it: We haven&#8217;t added any error handling yet. For which we have basically two options: use return values, or use exceptions.</p>
<p>If we use return values, our nice clean 15-line method will suddenly blow up to easily five times as large, with lots of conditional statements and lots of ugly, repetitive boilerplate code. Such code is annoying to write and to read, easy to make mistakes in, and <i>very</i> tempting to get sloppy with. Suddenly, I can no longer hold a mental picture in my mind of what the method actually does. I&#8217;d have to go through the code line-by-line and trace all the possible code paths. Even then, I am quite likely to miss some subtlety. And since the method is now too large to fit on a screen, I should refactor it into multiple smaller methods, which requires adding even more errorhandling boilerplate.</p>
<p>On the other hand, if I use exceptions instead, most of my methods will become only a little larger, and all of the errorhandling code will be nicely contained in one place where I can reason about it separately. The rest of the code stays nice and clean and easy to understand at a single glance.</p>
<p>&#8220;Aha!&#8221; say the exception skeptics (exceptics?). &#8220;But that reduced complexity is only illusionary! If every line of code in your program might thrown an exception, you still have an enormous amount of possible code paths to worry about &#8212; they&#8217;re just not explicitly visible in your source code anymore, which is even worse!</p>
<p>Well yeah, that&#8217;s true. I still have to worry about those exceptions even if I can&#8217;t see them. But the nice thing is, I <i>can</i> think about them, because we are now back in the scenario where I have just 15 lines of code which I can easily turn into a mental picture which I can reason about. I can look at that code and ask myself: &#8220;What happens if an exception is thrown on this line?&#8221; &#8220;Will the resource I allocate on line 4 be properly cleaned up in all possible scenarios?&#8221; &#8220;If this function fails because of an invalid input file, how will we recover from that?&#8221; And the big difference is that I can reason about these questions <i>at a much higher level of abstraction</i>.</p>
<p>It&#8217;s like the difference between an amateur chess player and an expert. When a novice player thinks about his next move, he works his way through the possible options as a computer would: &#8220;I could do this, but then my opponent will do either that or that. Or I could do this, but then she will do &#8230;&#8221; This is not a good way to play chess, since there are a lot of different combinations and the human brain is really bad at systematically working its way through such a huge decision tree.</p>
<p>An expert chess player, on the other hand, thinks at the level of strategy and tactics, rather than individual moves. &#8220;I am on the offensive, but my opponent has stronger control over the center field. It looks like I could attack over the left flank and win a few pawns, but I should probably do something about that exposed bishop first..&#8221; Only after analyzing the board at this high level, will the expert identify a small number of specific opportunities and threats to be investigated at the level of individual moves and counter-moves.</p>
<p>From a given starting position, the number of moves and counter-moves available to the expert is the same as to the novice. But by thinking about the game at a high level of abstraction first, the expert can dismiss the vast majority of possibilities at a glance, without conscious thought, and focus on the half-dozen interesting cases which need to be investigated in detail. It <i>could</i> happen that the expert overlooks some subtle trick which the plodding novice would have discovered, but the smart money will bet against it.</p>
<p>(As an aside, research has shown that experts are much better than  novices at memorizing the position of the pieces of a chess board, but  only when dealing with a position from a real game. When the pieces are  placed on the board randomly, the experts are not much better at  memorizing them than people with no knowledge of chess at all.)</p>
<p>I am hardly a chess master, but I&#8217;m fairly good at reading code. When I try to read a large block of ugly code where every &#8220;real&#8221; function call is followed by a dozen lines of boring error handling code, I feel like the novice chess player in the story above, missing the forest for the trees. When reading a nice clean piece of code where the error handling is provided through exceptions, I feel like the expert player, spending most of my time thinking at a higher level of abstraction. The complexity of the situation is the same either way, but now I have a mental picture of the situation which allows me to visualize it from different angles, dismiss all of the boring straightforward cases at a glance, and quickly zoom in on the ones which require my detailed attention.</p>
<p>At this point, assuming you&#8217;re still reading, you are probably either nodding along with me or you&#8217;re getting red in the face and starting to sputter in protest. If you&#8217;re nodding along, that likely means you belong to the camp which views programming as an essentially artistic activity, where intuition and the undefinable concept of &#8216;elegance&#8217; play a big role. You believe that there is an unprovable, but very real, connection between the subjective beauty of a piece of code and the chance that it is correct.</p>
<p>On the other hand, if you&#8217;re sputtering and getting angry, you probably belong to the camp which feels that software development is an engineering discipline in which feelings and emotions have no place. Your standpoint is that if a piece of code has 200 different possible execution paths, then somebody should bloody well go over all of those 200 paths and verify that each of them is correct &#8212; anything else is sloppy and amateurish and would never be allowed in your team.</p>
<p>And to be fair, I have a lot of sympathy for that standpoint, as I like to think of myself as a left-brain type of person as well. In fact, if I were in charge of developing the control software for a nuclear reactor, I would change sides: I would disallow exceptions, make sure that all conditionals are written out explicitly, and then use a variety of formal proof techniques to guarantee the correctness of every single path through the code. My team&#8217;s productivity may be less than one line of code per day, but there will be no mushroom clouds on my watch!</p>
<p>At this point, the people in the second group puff up their chest and say, well maybe you think that <i>your</i> project is not important enough to be worth using proper software engineering techniques on, but <i>I</i> take pride in my work and all my code is written to the same level of quality as a nuclear reactor! Well, maybe. Good for you. In the meantime, I have a deadline to meet.</p>
<p>So, dear reader, to determine which of these two camps you believe yourself to be in: when was the last time your used mathematically rigorous formal proof techniques to validate your program&#8217;s correctness? Yeah, me neither. Well, since we have just found out which camp you are <i>not</i> in, you had better make sure that your code is readable and elegant enough to be able to reason about it intuitively. It&#8217;s the only chance you have of delivering reasonably bug-free code at an acceptable level of productivity.</p>
<p>(Oh, and adding some automated testing won&#8217;t hurt either way. As Donald Knuth once famously wrote: &#8220;Beware of errors in the above code; I have only proved it correct, not tested it.&#8221;)</p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/exception-handling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Scripting Excel: can it really be this horrible?</title>
		<link>http://mwolf.net/archive/vba-sucks/</link>
		<comments>http://mwolf.net/archive/vba-sucks/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 18:58:13 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[hacks]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[vba]]></category>

		<guid isPermaLink="false">http://mwolf.net/archive/vba-sucks/</guid>
		<description><![CDATA[OK, I have managed to avoid this for a long time, but I guess it was inevitable: here comes Martin&#8217;s cheap, nonconstructively sarcastic I-hate-Microsoft post.
So I was visiting my parents this week-end, and my Dad asked me to help him with a little macro job on an Excel spreadsheet. It sounded simple enough. However, I [...]]]></description>
			<content:encoded><![CDATA[<p>OK, I have managed to avoid this for a long time, but I guess it was inevitable: here comes Martin&#8217;s cheap, nonconstructively sarcastic I-hate-Microsoft post.</p>
<p>So I was visiting my parents this week-end, and my Dad asked me to help him with a little macro job on an Excel spreadsheet. It sounded simple enough. However, I had forgotten just how astonishingly horrible <a href="http://msdn.microsoft.com/en-us/isv/bb190538.aspx">Visual Basic For Applications</a>, the sorry excuse for a programming language built into Excel (and the other Office applications), can be.</p>
<p>As far as I remember, the last time I did anything with VBA was probably somewhere in the late nineties. Even by the standards of back then, VBA is a really shitty programming language. By the standards of 2009, it&#8217;s spectacularly bad. The only explanation I can think of is that somewhere high up in Microsoft Strategic Command, somebody decided to spend a lot of effort on making it as useless and infuriating as possible, while still keeping it just barely functional enough to be able to do the things you want to do with it, if you&#8217;re willing to go through a lot of pain. God only knows <em>why</em> they made that decision, but surely a language as bad as this cannot be created by accident.<br />
<!-- more --><span id="more-83"></span>What surprised me most is that it was still exactly as bad as I remembered. We&#8217;re talking about Office 2007 here, the all-new singing and dancing one with the cool magic fluffy ribbons and shit. Surely they could have made some minor improvements to the scripting language while they were hard at work at hiding the &#8216;File&#8217; menu behind a big round decorative window-corner ornament? In fact, I would have expected them to have simply integrated the .NET framework into Office by now, so that you could write your Excel macros into <a href="http://en.wikipedia.org/wiki/F_Sharp_%28programming_language%29" target="_self">F#</a> if you felt like it?</p>
<p>Apparently, that is not the case. The language is still filled with lots of little inconvenience such as the fact that, where every other programming language on Earth lets you return a value from a function by doing something like</p>
<blockquote>
<div><strong>def</strong> thisFunctionReturnsThree()<br />
&nbsp;&nbsp;&nbsp;&nbsp;<strong>return</strong> 3<br />
<strong>end</strong></div>
</blockquote>
<p>VBA, for reasons which I&#8217;m sure seemed like a good idea to somebody at some point, expect you to do it like this:</p>
<blockquote><p><strong>Function</strong> ThisFunctionReturnsThree()<br />
&nbsp;&nbsp;&nbsp;&nbsp;ThisFunctionReturnsThree = 3<br />
&nbsp;&nbsp;&nbsp;&nbsp;<strong>Exit</strong> <strong>Function</strong><br />
<strong>End</strong> <strong>Function</strong></p></blockquote>
<p>But minor inconveniences like that are just a scratch on a gaping shotgun wound. Where VBA <em>really</em> starts driving the red-hot splinters of aggravation under your fingernails, is when you try to use the Collection object. The VBA Collection is some kind of schizophrenic bastard datatype with a horrible identity crisis, which can never quite decide whether it wants to be a fancy-schmancy associative array (a.k.a. <em>hash</em> or <em>map</em>, for those of my readers who are used to non-braindead programming environments) or just an ordinary variable-sized array. Anyway, back in 1997 or whatever, the Collection type supported a grand total of four methods:</p>
<blockquote><p>Add<br />
Remove<br />
Count<br />
Item</p></blockquote>
<p>And here in 2009, it still supports the same four methods! Note that there is no way to enumerate all the keys in the collection (when using it as a hash), nor is there a way to ask the collection object if it contains a given value (when using it as a resizeable array). Doing <em>myCollection.Item(&#8221;foo&#8221;) </em>will throw an error if <em>foo</em> is not present in the collection. So there is no way, at least without grossly abusing the exception-catching mechanism as a flow control technique, which is evil, to access a value of which you are not certain whether it exists.</p>
<p>Nor does VBA seem to natively offer any more powerful container types. The standard trick, apparently, is to bring in the <a href="http://msdn.microsoft.com/en-us/library/x4k5wbx4%28VS.85%29.aspx">Scripting.Dictionary</a> from the Windows Scripting Library, which is not a standard part of either VBA or MS Office, but which can kinda-sorta generally be expected to probably be present on most Windows systems which have been installed and occasionally updated during the past five years. (The ones which have not been regularly updated are, of course, so bogged down with malware by now that there won&#8217;t be any CPU power left to run Excel anyway, so no need to worry about them.) Scripting.Dictionary is hardly the equal of, say, <a href="http://ruby-doc.org/ruby-1.9/classes/Hash.html">the <em>Hash</em> class from Ruby&#8217;s standard library</a>, but at least it has a <em>contains</em> method and it allows you to enumerate its keys and values. Welcome to the 1990&#8217;s, Microsoft!</p>
<p>By the way, have I already mentioned that VBA, again unlike every other programming language invented after the death of Blaise Pascal, does not do lazy evaluation on Boolean expressions? So if you want to write something along the lines of, say</p>
<blockquote><p><strong>If</strong> myDict.Contains(&#8221;foo&#8221;) <strong>And</strong> myDict(&#8221;foo&#8221;) = &#8220;bar&#8221; <strong>Then</strong><br />
&nbsp;&nbsp;&nbsp;&nbsp;&#8217; Do something cool<br />
<strong>Else</strong><br />
&nbsp;&nbsp;&nbsp;&nbsp;&#8217; Do something not-quite-as-cool<br />
<strong>End</strong> <strong>If</strong></p></blockquote>
<p>then your code will break because, after having cleverly determined that <em>myDictionary</em> does not contain an item named &#8220;foo&#8221;, VBA will then shrewdly try to access it anyway, and promply die.</p>
<p>Oh, and neither Collection nor Dictionary has a built-in ability to sort its contents. If you search for e.g. &#8220;vba sort container&#8221; you will find a lot of people who have helpfully written their own functions to perform this rather basic task.</p>
<p>We did eventually get my Dad&#8217;s spreadsheet to do what we wanted it to. I&#8217;d estimate that I spent about 10% of that time on actually writing the functionality we were after, and the other 90% of the time swearing at the absurd, arbitrary limitations of VBA and trying to come up with stupid workarounds for things which should be absolutely trivial, and which <em>would</em> be absolutely trivial in any other language, and which anybody who wants to write a non-trivial piece of code is going to need so why isn&#8217;t it present by default in this supposedly mature product? Gah.</p>
<p>Now, it is very possible, even quite likely, that many of the things I ran up against were simply a result of my unfamiliarity with the programming environment. I am more than happy to admit that I am not an Excel guru or a VBA guru. Probably then, there are much better solutions to each of the problems mentioned above. Please feel free to teach me about them and laugh at my ignorance!</p>
<p>However, I do know how to use Google, and what I found were not solutions but just a lot of other people having the same problems and complaining about how incredibly limited the language is. So, Microsoft, could we perhaps ask you to take perhaps one-half of a developer off the task of making Office 2010&#8217;s ribbons even sparklier, and bringing the facilities of VBA up to the level of an average scripting language from ten years ago? Thanks in advance.</p>
<div class="zemanta-pixie"><img class="zemanta-pixie-img" src="http://img.zemanta.com/pixy.gif?x-id=b085ed89-44da-8b79-a671-214213b15383" alt="" /></div>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/vba-sucks/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Golfing with prime factors</title>
		<link>http://mwolf.net/archive/golfing-with-prime-factors/</link>
		<comments>http://mwolf.net/archive/golfing-with-prime-factors/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 13:57:53 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[hacks]]></category>
		<category><![CDATA[me]]></category>
		<category><![CDATA[code golf]]></category>
		<category><![CDATA[factoring]]></category>
		<category><![CDATA[hanoi]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[perl golf]]></category>
		<category><![CDATA[prime factors]]></category>
		<category><![CDATA[prime numbers]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[towers of hanoi]]></category>

		<guid isPermaLink="false">http://mwolf.net/?p=55</guid>
		<description><![CDATA[Dirk-Jan reminded me of the Perl Golf and Code Golf contests, both of which have the aim of solving a simple programming task in as few characters of source code as possible. See his post for a stunning example.
One of the open challenges is to work out the prime factors of a given number. To [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.djcbsoftware.nl/ChangeLog/2009/05/perl-golf.html">Dirk-Jan</a> reminded me of the <a href="http://perlgolf.sourceforge.net/">Perl Golf</a> and <a href="http://codegolf.com/">Code Golf</a> contests, both of which have the aim of solving a simple programming task in as few characters of source code as possible. See his post for a stunning example.</p>
<p>One of the <a href="http://codegolf.com/competition/browse">open challenges</a> is to <a href="http://codegolf.com/prime-factors">work out the prime factors</a> of a given number. To make things a little more difficult, the output must be printed in a specific format:</p>
<blockquote><p><span style="font-family: Courier New;"> 7000: 2^3 5^3 7<br />
123456789: 3^2 3607 3803</span></p></blockquote>
<p><span id="more-55"></span>Here is one of my attempts at a Ruby implementation, which is 97 bytes after removing the final trailing newline:</p>
<blockquote><p><span style="font-family: Courier New;"> print&#8221;#{n=gets.to_i}: &#8221;<br />
(2..n).each{|x|i=0;n,i=n/x,i+1while n%x&lt;1<br />
print x,i&lt;2?&#8217; &#8216;:&#8221;^#{i} &#8220;if i&gt;0}</span></p></blockquote>
<p>Then we switched to Perl, and together Dirk-Jan and I managed to come up with a version which is just 82 bytes after removing all the newlines (just pipe it through &#8220;<span style="font-family: Courier New;">perl -pe chomp</span>&#8220;).</p>
<blockquote><p><span style="font-family: Courier New;"> print$n=pop,&#8217;:';for(2..$n){<br />
$i=0;$n/=$_,$i++until$n%$_;<br />
print&#8221; $_&#8221;.&#8221;^$i&#8221;x($i&gt;1)if$i}</span></p></blockquote>
<p>(Be aware that the Ruby version takes its input from <span style="font-family: Courier New;">stdin</span>, while the Perl program looks at its first command-line argument.)</p>
<p>That&#8217;s pretty compact, right? Algorithmically, the code is actually fairly straightforward; probably the most evil trick we use is to employ the &#8216;<span style="font-family: Courier New;">x</span>&#8216; operator, which does string concatenation a given number of times, in combination with the fact that any Boolean expression can be treated as a numeric value of either 0 or 1, as in C (but not in Ruby).</p>
<p>Unfortunately, both of the versions given above are horrendously slow. They have a running time of <a href="http://en.wikipedia.org/wiki/Big_O_notation">O(<em>n</em>)</a>, while any self-respecting factorization algorithm should be at most proportional with the largest prime factor of <em>n</em>. It took my Pentium-4 system almost 45 minutes to calculate the factors of 2,000,000,000, which a reasonably intelligent high school student could probably have done in half a minute or so. Here is a version which can do it in a fraction of a second, but at the cost of being a whole 10 bytes larger:</p>
<blockquote><p><span style="font-family: Courier New;"> print$n=pop,&#8217;:';<br />
for($x=2;$n&gt;1;$x++){<br />
$i=0;$n/=$x,$i++until$n%$x;<br />
print&#8221; $x&#8221;.&#8221;^$i&#8221;x($i&gt;1)if$i}</span></p></blockquote>
<p>So, how are we doing in the contest? Well, we&#8217;re not even competing. The <a href="http://codegolf.com/leaderboard/competition/prime-factors/">leader</a> is currently at an astonishing 76 bytes! I have no idea how they do that. The winning programs are all in Perl, which tends to do very well in this type of contest despite the fact that all variable names are at least two characters. The best Ruby program is 82 bytes, with Python coming in at 100 and PHP at 122.</p>
<p>Another <a href="http://codegolf.com/tower-of-hanoi">task</a> in Code Golf is to write a program which can solve the famous <a href="http://en.wikipedia.org/wiki/Towers_of_hanoi">Towers of Hanoi</a> puzzle, given a random starting position with up to nine disks. The input consists of three lines of text, each giving the sequence of disks for one of the three pegs. For example:</p>
<blockquote><p><span style="font-family: Courier New;"> 975<br />
864<br />
321</span></p></blockquote>
<p>The goal is to print the series of moves needed to get all disks together on peg C, the third one, following the usual rule that at no time may a larger disk be on top of (to the right of) a smaller one. Here is a very simple example run with only three disks:</p>
<blockquote><p><span style="font-family: Courier New;"> $ cat simple.txt<br />
31</span></p>
<p>2</p>
<p><span style="font-family: Courier New;"> $ ruby hanoi.rb simple.txt<br />
2 to B<br />
1 to B<br />
3 to C<br />
1 to A<br />
2 to C<br />
1 to C</span></p></blockquote>
<p><span style="font-family: sans-serif;">We haven&#8217;t spent as much time yet on this one as on the prime factorization problem. Here is my best effort so far:</span></p>
<blockquote><p><span style="font-family: Courier New;"> T,D=[],[:A,:B,:C]<br />
D.each{|t|gets.chomp.each_byte{|x|T[x-48]=t}}<br />
def m s,t<br />
if s&gt;0<br />
m s-1,(D-[t,T[s]])[0]<br />
puts&#8221;#{s} to #{t}&#8221;<br />
m s-1,T[s]=t<br />
end<br />
end<br />
m T.size-1,:C</span></p></blockquote>
<p><span style="font-family: sans-serif;">This is an embarassing 157 bytes. In the actual <a href="http://codegolf.com/leaderboard/competition/tower-of-hanoi/">contest participants</a>, Ruby is leading the pack this time, with 104 bytes, while the best Perl entry so far is a whole six bytes larger. Go Ruby!</span></p>
<p><span style="font-family: sans-serif;">The Code Golf contest is open to participants using <a href="http://en.wikipedia.org/wiki/Ruby_%28programming_language%29">Ruby</a>, <a href="http://en.wikipedia.org/wiki/Perl">Perl</a>, <a href="http://en.wikipedia.org/wiki/Python_%28programming_language%29">Python</a> and <a href="http://en.wikipedia.org/wiki/PHP">PHP</a>. Nonetheless, <a href="http://blog.leenarts.net/">Jeroen</a>, <a href="http://blog.hendricksen.eu">Jeroen</a> and <a href="http://blogs.infosupport.com/blogs/raimondb/">Raimond</a>, I look forward to you trying to beat the above programs in Java or C#..Â  <img src='http://mwolf.net/wordpress/wp-includes/images/smilies/icon_razz.gif' alt=':-P' class='wp-smiley' /> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/golfing-with-prime-factors/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Obfuscate your numbers!</title>
		<link>http://mwolf.net/archive/number-obfuscator/</link>
		<comments>http://mwolf.net/archive/number-obfuscator/#comments</comments>
		<pubDate>Sun, 07 Jun 2009 21:40:18 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[cool-tool]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[obfuscator]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://mwolf.net/?p=52</guid>
		<description><![CDATA[Time for some silliness.
In a little over a month, I will be

years old.
What&#8217;s that, you didn&#8217;t get it? Here, I&#8217;ll repeat it for you in terms you may understand more easily:

At work, it has become a bit of a tradition that when people announce their birthday, they do so in an at least somewhat obfuscated [...]]]></description>
			<content:encoded><![CDATA[<p>Time for some silliness.</p>
<p>In a little over a month, I will be</p>
<p><img style="max-width: 800px;" src="http://mwolf.net/images/obfuscated-1.png" alt="" /></p>
<p>years old.</p>
<p>What&#8217;s that, you didn&#8217;t get it? Here, I&#8217;ll repeat it for you in terms you may understand more easily:</p>
<p><img style="max-width: 800px;" src="http://mwolf.net/images/obfuscated-2.png" alt="" /></p>
<p>At work, it has become a bit of a tradition that when people announce their birthday, they do so in an at least somewhat obfuscated format. Hexadecimal, binary and more obscure number formats are always popular, of course, as are silly descriptions of the form &#8220;my age is the ninth distinct <a href="http://en.wikipedia.org/wiki/Biprime">biprime</a>&#8220;. But last year I decided to take it to the next level, and write a little generator in Ruby for expressions such as the ones you see above. As you can probably guess, the expressions are generated using TeX.</p>
<p>You can <a href="http://mwolf.net/code/obfuscator/obfuscate.html">play with it for yourself, if you want to</a>, and also download the latest version of the code. But please be gentle with my server, as you can probably guess it&#8217;s a rather heavy application and I&#8217;m running this site on a little home PC..</p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/number-obfuscator/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Thinking in bits</title>
		<link>http://mwolf.net/archive/thinking-in-bits/</link>
		<comments>http://mwolf.net/archive/thinking-in-bits/#comments</comments>
		<pubDate>Sun, 16 Nov 2008 22:21:44 +0000</pubDate>
		<dc:creator>martin</dc:creator>
				<category><![CDATA[hacks]]></category>

		<guid isPermaLink="false">http://mwolf.net/archive/thinking-in-bits/</guid>
		<description><![CDATA[The number 65,536 is an awkward figure to everyone except a hacker, who recognizes it more readily than his own mother&#8217;s date  of birth
&#8211; Snow Crash, Neil Stephenson
A question which I like to use when interviewing C++ programmers: what is the range of a 32-bit integer?
I don&#8217;t use this question very often anymore, or [...]]]></description>
			<content:encoded><![CDATA[<p><em><small>The number 65,536 is an awkward figure to everyone except a hacker, who recognizes it more readily than his own mother&#8217;s date  of birth</small></em><br />
<small>&#8211; <a href="http://en.wikipedia.org/wiki/Snow_crash">Snow Crash</a>, Neil Stephenson</small></p>
<p>A question which I like to use when interviewing C++ programmers: <em>what is the range of a 32-bit integer?</em></p>
<p>I don&#8217;t use this question very often anymore, or at least I don&#8217;t let it influence my decision very much (which is why I don&#8217;t mind spilling the beans here), because the correlation with other technical skills turns out to be not as strong as I thought it would be. Still, there are some interesting patterns in the kind of people who know the answer versus those who do not.</p>
<p>Among the people who do not know the answer, some of them react quite affronted that they would even be expected to. What is the point, they will ask, in having memorized some little piece of trivia which they could Google up in a few seconds? Isn&#8217;t &#8220;knowing where to find it&#8221; a much more useful skill? Ask me about architecture! Ask me about design patterns and data structures! All of these objections have some validity, and indeed we will certainly ask about those other things during the interview. Still, I believe that it is perfectly reasonable to expect a good developer to know the answer to the above question by heart.<br />
<span id="more-28"></span><br />
There are two reasons why I believe that. The first is simple: if you don&#8217;t know the range of the data types you&#8217;re working with, how can you write reliable code? &#8220;I will look it up when I need it&#8221; is a common excuse, but that doesn&#8217;t fly with me. It would be a perfectly valid answer if I had asked you to recite from memory the arguments to some obscure library function, but we&#8217;re talking about the <em>integers</em> here! The most commonly used datatype in any programming language. Are you seriously telling me that, every single time you write</p>
<blockquote><p>for (int i = 0; i &lt; n; ++i)</p></blockquote>
<p>you look up in the compiler documentation whether the range of <em>i</em> is large enough that you can safely assume <em>n</em> to fall within it? And then you promptly forget about it so that you need to look it up again the next time? So that, when you arrive in my interviewing room with a CV that says you have five years of C++ experience, you have looked up that particular bit of trivia many thousands of times by now, and you still don&#8217;t remember it? What&#8217;s more likely is that you have never looked it up, which means that you are basically, as <a href="http://www.joelonsoftware.com">Joel Spolsky</a> would call it, <a href="http://www.joelonsoftware.com/articles/CollegeAdvice.html">programming by superstition</a>.</p>
<p>The other reason why you need to know this stuff is that you will be a more effective programmer if you are able to think of what you are doing in terms of the underlying processor instructions &#8212; a skill which is useful even when working in very high-level languages, but which is absolutely necessary when working in C or C++.</p>
<p>For example, even a non-programmer may be aware that the reason why the world is rapidly moving to 64-bit processors and operating systems right now, is because the 32-bit ones are unable to use more than 4GB of memory. But as a professional programmer, I would expect you to know why that is. The answer is simple: it&#8217;s because 32-bit machines use 32-bit memory addresses, and with those you can only access 2<sup>32</sup> = 4,294,967,296 different bytes. If you don&#8217;t know that, you are lacking a fairly basic piece of understanding of how computers work.</p>
<p>Now, of course I don&#8217;t expect people to know the exact number 4,294,967,296 by heart, but &#8220;a little over four billion&#8221; (or two billion for a signed integer) would not be too much to ask. If you cannot tell whether it is more or less than a million, as was the case with several &#8217;senior&#8217; candidates I had the pleasure to interview, you&#8217;d better hit the ball out of the ballpark on every other part of the interview. Which they did not. (Please note that I&#8217;m using the American usage of &#8216;billion&#8217; to mean a thousand million.)</p>
<p>Another point is that C++ has largely fallen out of favour nowadays for general application programming. So when it is still used, it is mostly to support performance-critical algorithms or very high-volume data processing. In that kind of code, having a good mental picture of the data structures you&#8217;re using is really crucial. If you want to have millions of instances of a given object in memory simultaneously, shaving a few bytes off each instance can really matter &#8212; which means that you need to know when you can safely use a <em>char</em> or a <em>short</em> instead of just blithely using <em>int</em>s for everything.</p>
<p>So, here are the facts which I expect every C++ programmer, as well as the better C# and Java ones, to know more readily than their mother&#8217;s birthday:</p>
<ul>
<li>A byte is 8 bits, and can hold 256 different values. Examples: the number of different characters in a non-Unicode text file, the number of different shades of red, green and blue in a true-colour image. Also the maximum filename length on ext3 and NTFS, and an unfortunate limit on path length in many programs.</li>
<li>A &#8217;short&#8217;, on most C/C++ compilers, is 16 bits, which means it ranges from -32,768 to 32,767 when signed, or from 0 to 65,535 when unsigned. A decade or two ago, it was quite common to have the default size of <em>int</em> be 16 bits on PC-based C compilers, and you may still encounter this in the embedded world. On many systems today the <em>wchar</em> type is 16 bits, so that it supports all characters in the UCS-2 subset of Unicode, which contains all of the characters you actually care about. Beware of the <a href="http://nl.wikipedia.org/wiki/UTF-16">difference between UCS-2 and UTF-16</a>, though. Before true-colour displays became the norm, there were video cards which supported 16-bit colour; usually this was actually 15-bits, with 5 bits (32 different shades) each for red, green and blue, although sometimes the remaining bit was used to allow 64 different shades for green, which is the colour which the human eye can perceive most accurately. 16 bits is also the default integer size on some database systems, which can give you some nasty surprises when your e-commerce system has been in production for a few months and a customer submits the 65536th order.</li>
<li>24 bits (three bytes) gives you about 16.8 million (16,777,216) different combinations. Among other things, this is the number of different colours on a true-colour display. In November 2006, <a href="http://www.slashdot.org">Slashdot</a> had to do an emergency update of their database, since they used the MySQL &#8216;mediumint&#8217; type for comment IDs, so the system broke when <a href="http://slashdot.org/articles/06/11/09/1534204.shtml">the 16777216th comment was posted</a>.</li>
<li>32-bits is the default integer size for almost any modern programming language &#8212; yes, even on 64-bit platforms, which will break any code which assumes that a pointer can be cast into an integer and back. As stated above, a signed integer ranges from minus to plus 2.1 billion, while the unsigned range goes up to almost 4.3 billion. This is a fairly generous range for most problem domains, but sooner or later it will bite you if you&#8217;re not aware of it. Among many other things, this is why many programs have trouble with data files larger than 2GB. Also the absolute upper limit on the number of unique <a href="http://en.wikipedia.org/wiki/IP4">IP4</a> addresses (the actual limit is much lower because of the way the address space is organized).</li>
<li>A <em>float</em> value uses 24 bits for the mantissa, which translates into 7 decimal digits of precision, which is less than most programmers intuitively expect. A <em>double</em> uses 53 bits, which is plenty for most purposes. Anyway, the most important thing you need to know about working with <a href="http://en.wikipedia.org/wiki/IEEE_754">floating-point numbers</a> is that it is fraught with <a href="http://hal.archives-ouvertes.fr/hal-00128124">gotchas and special cases</a> which <em>will</em> trip you up sooner or later. And <strong>never, ever</strong> use them for monetary values.</li>
</ul>
<p>I would also expect a serious programmer to know the powers-of-two table at least up to 2<sup>16</sup> = 65,536. You simply run into those numbers so often that you can&#8217;t help memorizing them after a while, and it&#8217;s surprising how often this knowledge can help you in debugging various kinds of overflow issues.</p>
<p>So what about larger integer sizes? Well, unless you work in cryptography or some other specialized field (in which case I <em>really</em> hope that you didn&#8217;t learn anything new from this article), you can safely assume that 64 bits is enough for anything you will ever want to do. But here is a handy little rule-of-thumb I use to convert between bits and decimal digits: 10 bits is 1,024 and 3 digits is 1,000, so to go from number-of-bits to number-of-digits you just divide by 10 and then multiply by 3. (Or for an even rougher approximation, just divide by 3.)</p>
<p>For instance, an unsigned 64-bit address represents roughly 3 * 6.4 = 19.2 decimal digits, so it can safely be used to store numbers well over 1,000,000,000,000,000,000. Likewise, if you want to know how many bits you need to store a 12-digit number, you divide by 3 and multiply by 10, which tells you that you need approximately 40 bits.</p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/thinking-in-bits/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Getting VOIP plus IP forwarding to work on the Speedtouch</title>
		<link>http://mwolf.net/archive/voip-on-speedtouch/</link>
		<comments>http://mwolf.net/archive/voip-on-speedtouch/#comments</comments>
		<pubDate>Sun, 22 Jun 2008 16:26:11 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[hacks]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://mwolf.net/archive/getting-voip-plus-ip-forwarding-to-work-on-the-speedtouch-2/</guid>
		<description><![CDATA[With my subscription to XS4ALL, I received a Thomson Speedtouch 716 ADSL/Wifi router, currently running software release 6.1.9.6. Behind that router, I have a Linux server which serves as the webserver for the blog you&#8217;re reading right now, as well as my mailserver and a few other things. The Linux server also acts as a [...]]]></description>
			<content:encoded><![CDATA[<p>With my subscription to XS4ALL, I received a Thomson Speedtouch 716 ADSL/Wifi router, currently running software release 6.1.9.6. Behind that router, I have a Linux server which serves as the webserver for the blog you&#8217;re reading right now, as well as my mailserver and a few other things. The Linux server also acts as a firewall for the rest of my network:</p>
<p><img style="max-width: 800px" src="http://mwolf.net/images/mynetwork.png" /></p>
<p>As you can see, the ADSL router and anything connected to it through the WiFi is considered untrusted: the real access point to my internal network is the Linux machine.</p>
<p>Among other things, the Speedtouch has the ability to support Voice-Over-IP by attaching an analog phone. Unfortunately, this functionality does not work in combination with the &#8220;assign public IP to a machine on the local network&#8221; setting. Which is a pity, because behind the router is my Linux server, running a web- and mailserver among other things, and I really want that server to have my public IP address. Partly because having the server NATted could cause problems with mail, in particular, in the sense that when I send out mail to another server, some suspicious spamblocker software may take offense if the address reported in the headers of my outgoing mail does not match my actual IP. But mostly because having a web/mail/FTP/whatever server hidden behind a NAT, just feels wrong.</p>
<p><span id="more-26"></span> So, I have two conflicting desires here: I want to use VOIP over a phone connected to the Speedtouch, and I want my server to believe that it is listening directly on my public IP. A little googling confirms that a lot of other people have this same problem, but if anybody has found a solution already, I didn&#8217;t see it.</p>
<p>There does exist a way to expose the server to the outside world without breaking VOIP: use the &#8220;game and application sharing&#8221; feature to forward all TCP and UDP ports <strong>except</strong> port 5060 (which is the SIP port used by the VOIP service) to the server. But then we are using NAT again, which is not what we want. What I want is: port 5060 is handled by the Speedtouch, all other packets are sent straight to my Linux server, which should receive them on the public IP address of my Internet account. Unfortunately, it seems that there isn&#8217;t any way to configure the Speedtouch like that.</p>
<p>How I eventually solved this problem is by adding an <a href="http://www.netfilter.org/">iptables</a> rule on my server, which uses <a href="http://www.netfilter.org/documentation/HOWTO//NAT-HOWTO-3.html">DNAT </a>to translate the source address of each packet back into my public IP before the server sees it.</p>
<p>Like this:</p>
<ul>
<li>A packet comes in from the outside world on my public IP address (82.95.250.5).</li>
<li>The router sees it, and if the packet is not aimed at the VOIP port (5060), sends it on to my server, which has a local IP of 10.0.0.1 (assigned to it by the router through DHCP).</li>
<li>An iptables rule on my server intercepts the packet and changes the source address back to 82.95.250.5. This way, all services running on my server can pretend that they are connected directly to the Net, without any special configuration needed.</li>
<li>As my server sends a response to the packet, the destination address is changed back to the router&#8217;s address (10.0.0.138).</li>
<li>The router performs a second layer of NAT, translating the destination address of the response packet back into 82.95.250.5 again, before sending it along to its final destination.</li>
</ul>
<p>So every packet exchanged between my server and the rest of the world, is NATted twice: once by the router, once by the server. Not a particularly elegant solution, but it will have to do until somebody comes along with a better way to bend the Speedtouch to his will (or until I buy a better ADSL router).</p>
<p>Here&#8217;s the magic iptables statement:</p>
<p><small><font face="Courier New">iptables -A PREROUTING -t nat -i $EXTERNAL -d $FAKEPUBLICIP -j DNAT &#8211;to-destination $PUBLICIP</font></small></p>
<p>In my case, $EXTERNAL would be eth1 and $FAKEPUBLICIP is 10.0.0.1.</p>
<p>As I said, it&#8217;s not a particularly elegant solution. One annoying consequence is the fact that if you try to browse to mwolf.net from the WiFi network, you&#8217;ll get an error because the router will get confused about where to send packets for 82.95.250.5. This could be solved with a bit of DNS magic.</p>
<p>But maybe somebody has already found a nice, clean way to configure the Thomson router to do what I want, and I just missed it. Any ideas would be appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/voip-on-speedtouch/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Firewall improvements from R. Scott Smith</title>
		<link>http://mwolf.net/archive/firewall-script-from-scott/</link>
		<comments>http://mwolf.net/archive/firewall-script-from-scott/#comments</comments>
		<pubDate>Sun, 25 Mar 2007 16:22:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[cool-tool]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://mwolf.net/archive/firewall-script-from-scott/</guid>
		<description><![CDATA[In response to my article about using the recent IPTables module to fight brute-force password attacks, based on an idea from Andrew Pollock, a reader worked out the idea into a complete firewall script, with configurable whitelisting, the ability to block multiple ports, and several other enhancements. Read his post for the details.
You can download [...]]]></description>
			<content:encoded><![CDATA[<p>In response to my <a title="IPTables against SSH brute-force attacks" href="http://mwolf.net/archive/iptables-against-ssh/">article</a> about using the <em>recent</em> IPTables module to fight brute-force password attacks, based on an idea from <a title="Andrew Pollock" href="http://blog.andrew.net.au/2005/02/16/">Andrew Pollock</a>, a reader worked out the idea into a complete firewall script, with configurable whitelisting, the ability to block multiple ports, and several other enhancements. Read <a title="Richard's post" href="http://mwolf.net/archive/iptables-against-ssh/#comment-120">his post</a> for the details.</p>
<p>You can download his firewall script <a title="Firewall script by R. Scott Smith" href="http://mwolf.net/misc-files/rc.firewall">here</a>. You can contact the author at the address <em>meetscott</em> at the domain <em>netscape.net</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/firewall-script-from-scott/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IPTables against SSH dictionary attacks</title>
		<link>http://mwolf.net/archive/iptables-against-ssh/</link>
		<comments>http://mwolf.net/archive/iptables-against-ssh/#comments</comments>
		<pubDate>Sun, 14 Jan 2007 21:26:39 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[hacks]]></category>

		<guid isPermaLink="false">http://mwolf.net/archive/iptables-against-ssh/</guid>
		<description><![CDATA[Like everybody who has a Linux server running an SSH daemon connected to the Internet, I regularly get attacked by people (well, botnets probably) trying to do a brute-force attack against the server. Such attempts can take many hours, during which they simply try many thousands of possible username/password combinations.
As long as you have your [...]]]></description>
			<content:encoded><![CDATA[<p>Like everybody who has a Linux server running an SSH daemon connected to the Internet, I regularly get attacked by people (well, botnets probably) trying to do a brute-force attack against the server. Such attempts can take many hours, during which they simply try many thousands of possible username/password combinations.</p>
<p>As long as you have your SSH server configured properly, the most important thing being to only allow SSH access to accounts which actually need it, this is more an annoyance than a problem. Nonetheless, it <em>is</em> an annoyance, if only because of all the crap in your logfiles.</p>
<p>There are many ways to detect and block such attacks. <a title="sshdfilter" href="http://www.csc.liv.ac.uk/~greg/sshdfilter/"><em>sshdfilter</em></a> works well, and a good detailed overview of the various options can be found <a title="Samhain labs about SSH attacks" href="http://www.la-samhna.de/library/brutessh.html">here</a>. One that particularly appealed to me, however, was a very simple netfilter-based technique consisting of only two lines of iptables code. It uses the <a title="RECENT patch" href="http://netfilter.org/documentation/HOWTO//netfilter-extensions-HOWTO-3.html#ss3.16"><em>recent</em> netfilter extension</a>, and the idea of using it to combat SSH attacks was apparently first conceived by <a title="Using IPTables against SSH attacks" href="http://blog.andrew.net.au/2005/02/16/">Andrew Pollock</a>.</p>
<p><span id="more-12"></span></p>
<p>Here&#8217;s my version:<br />
<font face="monospace"> iptables -A INPUT -p tcp &#8211;dport ssh -m state &#8211;state NEW \<br />
-m recent &#8211;set &#8211;name SSH -j ACCEPT<br />
iptables -A INPUT -p tcp &#8211;dport ssh -m recent &#8211;update \<br />
&#8211;seconds 600 &#8211;hitcount 6 &#8211;rttl &#8211;name SSH -j DROP</font></p>
<p>This will blacklist a given IP address as soon as six TCP connections to the SSH port are detected from that address within ten minutes, and then the blacklist will remain in effect until no further connection attempts have been made from that address during the past ten minutes. This is the most basic version: it does not support whitelisting and there&#8217;s no logging, but it works very well at what it does. As I can see from my logs, all crack attempts are indeed broken off after the first six attempts, and they generally don&#8217;t come back.</p>
<p>This is just one of the many reasons why Netfilter/IPTables is such a great tool. Sure, it&#8217;s a bit intimidating at first, but once you understand the basics, you can fine-tune your server&#8217;s routing and firewalling configuration to an almost limitless degree.</p>
<p>What I&#8217;m still looking for, however, is a way to control the &#8216;detection period&#8217; and the &#8216;blacklist period&#8217; separately. For example, if six SSH connections are made from a given address within four minutes, that address is blacklisted for the next twelve hours. Anybody know of a good way to do that?</p>
]]></content:encoded>
			<wfw:commentRss>http://mwolf.net/archive/iptables-against-ssh/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>
