Tuesday, January 12, 2016

Does anyone remember Counting-Sort from their Algorithms Classes?

So I was just working through a few practice interview problems on LeetCode.com, and I came across a problem that was exceptionally easy for me that was rated as "Medium" by the site. Here is the problem. It's a very simple counting-sort problem, yet when i was digging through the discussion forums (as I do after I solve any problem), there was a surprisingly small number of optimal solutions. Because you have a well defined and finite number of potential elements in your set, you can easily use an O(n) counting-sort. Here is a C# solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
public class Solution
{
    public void SortColors(int[] nums)
    {
        int redCount = 0, whiteCount = 0, blueCount = 0;
        foreach (int num in nums)
        {
            switch (num)
            {
                case 0:
                {
                    ++redCount;
                    break;
                }
                case 1:
                {
                    ++whiteCount;
                    break;
                }
                case 2:
                {
                    ++blueCount;
                    break;
                }
                default:
                {
                    throw new ArgumentException("Illegally formatted input.");
                }
            }
        }
        
        int index;
        for (index = 0; index < redCount; ++index)
        {
            nums[index] = 0;
        }
        
        for (; index < redCount + whiteCount; ++index)
        {
            nums[index] = 1;
        }
        
        for (; index < nums.Length; ++index)
        {
            nums[index] = 2;
        }
    }
}

I'm not even sure what to say about the medium difficulty rating of this problem relative to its actual difficulty. I'm sure I could have written this solution with fewer lines (especially with the triple for-loop at the end), but this is easy enough to read. It's just interesting to me because LeetCode (at least for me) is a site chock-full of exceptionally hard problems. The range of difficulty is bounded typically at the bottom by reasonable interview questions for an entry-level position, and upper-bounded by questions that could take me a full workday to solve. I've never been on an interview that asked a question this easy, so what is it doing in the "Medium" tier?

It could be that counting-sort it used so irregularly that people have forgotten about it as well as other O(n) sorting algorithms (ex. radix sort). Counting-sort performs very well (small constant terms) in relation to radix sort, however it takes more memory (counting sort takes O(n + k) space and radix takes only O(n) space where k is the number of distinct elements in your set). Despite these excellent runtimes, why do I rarely find the opportunity to use these counting sort algorithms?

It's because we need to know some extra information about the data we will be sorting before we're able to use a constant-time algorithm. In the case of counting sort, we need to know the number of distinct elements in the set we're trying to sort. In the case of radix sort, we need to know the max number of digits each number has (or the length of the longest string, etc.) in order to use it. In practice, we rarely find ourselves with this opportunity. For counting sort, the most common use case is exactly what the LeetCode was asking, just with more relevant data than red, white, and blue. An example of where radix sort is useful is in sorting dates chronologically by year, month, day, and time.

So what is the point of this blog post? I think that as it gets longer and longer since I took my algorithms class, I think that I tend to forget some more interesting algorithms and where/when to apply them. I remember all the important ones, especially things like sorting algorithms and hash tables, but I should brush up on different types of trees. It would really only take a few minutes, and if it could make a performance difference in the code that I write, it's definitely worth it. The reason this medium problem on LeetCode was so easy for me was because I've implemented multiple types of O(n) sorting algorithms so it was easy for me to recognize that this problem can be solved using one. If I had never heard of (or maybe not studied enough) O(n) sorting algorithms, maybe it wouldn't have been so easy. It just goes to show that coding and software engineering is a very academic activity, and it's helpful to sit down with a textbook and study our algorithms every now and again to keep them fresh.

No comments:

Post a Comment