{"id":14,"date":"2012-07-20T23:14:52","date_gmt":"2012-07-21T04:14:52","guid":{"rendered":"http:\/\/notnuke.com\/blog\/?p=14"},"modified":"2012-09-25T11:45:06","modified_gmt":"2012-09-25T16:45:06","slug":"math-trick-vs-coding-trick","status":"publish","type":"post","link":"https:\/\/www.notnuke.com\/blog\/2012\/07\/math-trick-vs-coding-trick\/","title":{"rendered":"Math Trick vs Coding Trick"},"content":{"rendered":"<div>\n<div>\n<p>I have some code where I have a value x=2^n, and I wanted to see what that n was, knowing that there&#8217;s only one 1 in the binary representation, and it&#8217;s not a huge number. \u00a0I know it&#8217;s not hard, but I was wondering if there was a better way, so I looked around. \u00a0I ran into a post from someone (student?) who had to compute the same thing without branching, and that got me thinking.<\/p>\n<p>I&#8217;ve now written this two times and found a few more ways, unrolled one of them by hand to see how much that helped (some, but not a whole lot), but here are the main four contenders: \u00a0the naive way, the log-trick way, inline assembly language, and using DeBruijn sequence:<\/p>\n<p><strong>Code: <\/strong><em>(all were set inside an <\/em>int func(int)<em> wrapper)<\/em><\/p>\n<ol>\n<li>while((x&gt;&gt;=1)&gt;1) y++;<\/li>\n<li>y=log(x)\/log(2);<\/li>\n<li>asm(&#8220;fld1; fld x; fyl2x; fstp y;&#8221;)\u00a0 <em>(wrapped with type conversions from integer to float and back)<\/em><\/li>\n<li>y=myDeBruijnArray[(unsigned int)(x*0x077CB531U) &gt;&gt; 27]; <em>(where &#8220;myDeBruijnArray&#8221; is a lookup table 32 32-bit numbers)<\/em><\/li>\n<\/ol>\n<p><strong>Times:<\/strong>\u00a0 <em>(for 10,000,000 iterations)<\/em><\/p>\n<ol>\n<li>0.602, our base case for 16 bits. \u00a0Counting 8 bits ran almost twice as fast, as one would expect, and I&#8217;d imagine that counting out 32 bits would take even longer.<\/li>\n<li>2.098, which makes sense because we&#8217;re doing two function call to math libraries, but it met the condition of &#8220;no branching&#8221; that the one student had.<\/li>\n<li>0.521 or 0.715 (see below), which gets us out of our conditional, but at the cost of using the FPU which has a <em>suprising<\/em> amount of overhead, and again, the number of bits makes no appreciable difference.<\/li>\n<li>0.283, which has an &#8220;expensive&#8221; imul opcode, but that&#8217;s it.<\/li>\n<\/ol>\n<p>One interesting thing about the assembly language:<\/p>\n<p>I expected the function that was all inline assembly language and some space for variable to result in the smallest code, and it wasn&#8217;t. \u00a0There were 14 more extra assembly language operations to convert between the unsigned int value I had to start with and the floating point values I needed for the assembly language and back again. \u00a0The DeBruijn method used the standard headers and footers for the command that all of the versions used, and then four lines:<\/p>\n<pre>mov  eax,dword ptr [x]\r\nimul eax, eax,77CB531h\r\nshr eax, 1Bh\r\nmov eax, dword ptr myDeBruijnArray [eax*4]<\/pre>\n<p>Because of conversion time, I wanted to try doing the inline assembly code, but instead of calling the routine with the integers I was using elsewhere, I changed from an integer to a floating point before making the call to the routine, leaving the assembly language version strictly in floating point numbers, and it turns out that using assembly language with all floating-point values saved enough time for assembly language to be a hair faster than the naive approach, and of course if I had to use floating points anyways, it&#8217;s a *lot* faster than calling the math libraries. \u00a0It turns out that I could even drop the last &#8220;store my FPU register in Y&#8221; and leave it on the top of the stack for the return value, which is what &#8220;return y&#8221; does anyways, so now that function looks like this in assembly language,<\/p>\n<pre>fld1\r\nfld \u00a0 \u00a0 \u00a0 \u00a0 dword ptr [x]\u00a0\r\nfyl2x<\/pre>\n<p>They take longer to run than the op codes we needed for the DeBruijn code, but that code *only* works for integers, and knowing that there&#8217;s a log2 function in the processor that&#8217;s not in the standard library cuts my time by 75%, which if I had to do a lot of it (data mining? \u00a0Video processing?) could be very significant.\u00a0 Also, use integer variables instead of floating-point variables when you do not need the latter.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>I have some code where I have a value x=2^n, and I wanted to see what that n was, knowing that there&#8217;s only one 1 in the binary representation, and it&#8217;s not a huge number. \u00a0I know it&#8217;s not hard, &hellip; <a href=\"https:\/\/www.notnuke.com\/blog\/2012\/07\/math-trick-vs-coding-trick\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-14","post","type-post","status-publish","format-standard","hentry","category-tech"],"_links":{"self":[{"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/posts\/14","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/comments?post=14"}],"version-history":[{"count":3,"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":35,"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/posts\/14\/revisions\/35"}],"wp:attachment":[{"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/media?parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/categories?post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.notnuke.com\/blog\/wp-json\/wp\/v2\/tags?post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}