Monday, February 22, 2016

How I Think I'm Able to Graduate a Year Early

So I'm currently a third-year student at UMass Amherst studying computer science with a concentration in software systems. My sixth semester is my final semester. I get asked how I was able to graduate so early once a week, and I never really had a good answer. I came in with 12 AP credits, so that helps a lot, but it wasn't a game changer. I only placed out of UMass' intro calc course. I even came in as a physics major, officially changing my major the summer after my freshman year.

I had a bit of a realization today, however: I've never had a consistent academic advisor. Every semester, every UMass student is required to sit down with their assigned academic advisor and review the courses they intend to take next semester; you are not allowed to enroll until you do so. I have done so every semester, as is required, but I have never met with the same person twice. Six times, my advisor has been changed (actually more, but sometimes I never even met the person they assigned to me). As a result, it was never possible for a professor to get to know my academic interests for more than a single 15 minute meeting. They could only go through their list of checkboxes on a progress form and say that I'm on the right track or not. It's also worth saying that I never let them pick the classes I was going to take, as some students do. They might have suggested that taking five computer science classes in one semester is too much, but what else am I supposed to do, just sit around and twiddle my thumbs not being challenged? Or take some bogus class that doesn't apply to what I want to do?

No. I chose to make my own schedule, and because I know how hard I can push myself, I was able to make the choice to take as many comp sci classes as I could possibly fit. Consistently, I earned close to the maximum amount of credits per semester (once or twice even petitioning to go over the limit), which, combined with the 12 AP credits, put me over the graduation requirement of 120.

I don't think this would have happened if I had a consistent advisor. They undoubtedly would have challenged my course selection every semester and I would have been more likely to lose those fights. I have had to make that fight quite often, however because I was just meeting that person for the first time, I had some leverage in being able to make a strong impression every time. I could use that to my advantage, putting forward my most passionate and high-achieving foot forward in order to convince them I was capable. People are always vulnerable when you first meet them, and this was why I was able to get into the classes I wanted.

I don't think that this is a particularly insightful post; I just think it is an interesting thought to consider. Because I wasn't tethered to an academic advisor for more than a few months at a time, I was able to take some personal freedoms in my education. It took some hard work (at one point, I knew all potential combinations of courses that I could take to graduate as soon as possible and I have them saved in a spreadsheet somewhere) but graduating early is worth it. I'll save a ton on student loans, start earning real money earlier, plus all my friends at school are a year older than me, so next year would have been very lonely. Looking forward to commencement on May 6, Go UMass!

Tuesday, January 12, 2016

Does anyone remember Counting-Sort from their Algorithms Classes?

So I was just working through a few practice interview problems on LeetCode.com, and I came across a problem that was exceptionally easy for me that was rated as "Medium" by the site. Here is the problem. It's a very simple counting-sort problem, yet when i was digging through the discussion forums (as I do after I solve any problem), there was a surprisingly small number of optimal solutions. Because you have a well defined and finite number of potential elements in your set, you can easily use an O(n) counting-sort. Here is a C# solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
public class Solution
{
    public void SortColors(int[] nums)
    {
        int redCount = 0, whiteCount = 0, blueCount = 0;
        foreach (int num in nums)
        {
            switch (num)
            {
                case 0:
                {
                    ++redCount;
                    break;
                }
                case 1:
                {
                    ++whiteCount;
                    break;
                }
                case 2:
                {
                    ++blueCount;
                    break;
                }
                default:
                {
                    throw new ArgumentException("Illegally formatted input.");
                }
            }
        }
        
        int index;
        for (index = 0; index < redCount; ++index)
        {
            nums[index] = 0;
        }
        
        for (; index < redCount + whiteCount; ++index)
        {
            nums[index] = 1;
        }
        
        for (; index < nums.Length; ++index)
        {
            nums[index] = 2;
        }
    }
}

I'm not even sure what to say about the medium difficulty rating of this problem relative to its actual difficulty. I'm sure I could have written this solution with fewer lines (especially with the triple for-loop at the end), but this is easy enough to read. It's just interesting to me because LeetCode (at least for me) is a site chock-full of exceptionally hard problems. The range of difficulty is bounded typically at the bottom by reasonable interview questions for an entry-level position, and upper-bounded by questions that could take me a full workday to solve. I've never been on an interview that asked a question this easy, so what is it doing in the "Medium" tier?

It could be that counting-sort it used so irregularly that people have forgotten about it as well as other O(n) sorting algorithms (ex. radix sort). Counting-sort performs very well (small constant terms) in relation to radix sort, however it takes more memory (counting sort takes O(n + k) space and radix takes only O(n) space where k is the number of distinct elements in your set). Despite these excellent runtimes, why do I rarely find the opportunity to use these counting sort algorithms?

It's because we need to know some extra information about the data we will be sorting before we're able to use a constant-time algorithm. In the case of counting sort, we need to know the number of distinct elements in the set we're trying to sort. In the case of radix sort, we need to know the max number of digits each number has (or the length of the longest string, etc.) in order to use it. In practice, we rarely find ourselves with this opportunity. For counting sort, the most common use case is exactly what the LeetCode was asking, just with more relevant data than red, white, and blue. An example of where radix sort is useful is in sorting dates chronologically by year, month, day, and time.

So what is the point of this blog post? I think that as it gets longer and longer since I took my algorithms class, I think that I tend to forget some more interesting algorithms and where/when to apply them. I remember all the important ones, especially things like sorting algorithms and hash tables, but I should brush up on different types of trees. It would really only take a few minutes, and if it could make a performance difference in the code that I write, it's definitely worth it. The reason this medium problem on LeetCode was so easy for me was because I've implemented multiple types of O(n) sorting algorithms so it was easy for me to recognize that this problem can be solved using one. If I had never heard of (or maybe not studied enough) O(n) sorting algorithms, maybe it wouldn't have been so easy. It just goes to show that coding and software engineering is a very academic activity, and it's helpful to sit down with a textbook and study our algorithms every now and again to keep them fresh.

Saturday, January 9, 2016

Depth of understanding of Tech Stacks

So before I start on this post, it's worth noting that this is about the rate I'll be posting on this blog probably. After the burst of posts to seed the blog initially, I'll probably post every 2-3 weeks about the most interesting project I've been working on. Since I started this blog, I've mostly been on winter break from school, and I'm taking time off instead of getting another internship (because after graduation, I'll be working full-time and I wanted to enjoy some time off). So my apologies for the lack of a constant stream of posts, but this is how it's going to go from here on out!

So today's topic is actually taking a break from coding for a little bit to talk about a post that I read from one of Steve Yegge's blogs. The post discusses the deepness of understanding of a technology stack (like, the WHOLE stack) that the average programmer needs to know. Is it okay to only know how to program in an application language, or does the average programmer also need to know how to program in a lower-level assembly language? Do we need to know how to implement circuit logic in hardware? What do we need to know about semiconductors? I've been thinking about this specific question for a couple months now, and I honestly don't think there is a good single answer.

What I do think, however, is that there is usually some situational variance. For instance, lets take any algorithm that uses multiplication. In the hardware you intend to run your program on, does their system implement constant time multiplication (number of CPU cycles, not the number of assembler instructions)? If not, any time you intend to multiply something it could take some extra non-constant time. Does the hardware that you are using have a specific hardware implementation that will make a specific algorithm run really quickly? For instance, an ASIC chip that will be used for bitcoin mining? The issue is not so easy to answer with sweeping generalizations, and that is why I think that the situation matters most when deciding how deep down a technology stack to learn.

Something that I have noticed about me and my fellow students (I'm a computer science major, not a computer engineering engineer), is that we tend to focus at too high of a level. I also think I'm less guilty of this because my concentration is in operating and distributed systems and I write a lot of C for school and MSIL for work, however us students tend to think of technology and programming strictly in the text that we write. There is some cognitive disconnect for a lot of students that says there is a magical layer called the JVM or .NET or Mono or whatever that does some hand-waving and all of a sudden the results are displayed on our screen. Nobody in school is taking the time to learn the internals of a language, never mind what that language is built on! They talk about the MEAN or [L|M|W]AMP stacks like the stack starts and ends within those four letters, which is absolutely not true! It's just abstracted away so that most of the time it doesn't matter to the application programmer. We are taught in school not to worry about the internals of a system; instead we should worry about writing fast application code (which is also very, very important) without worrying about how we can leverage the system we are using.

And in a way, it doesn't even matter that we don't have a thorough understanding of system internals! You could write a million lines of Python or Ruby and never need to worry about why your algorithm is slow. Many modern languages are exceptionally fast, and for the most part for simple web applications, the basics tools are good enough. What if you're writing something a little more interesting though that requires exceptionally quick results? What I think is happening nowadays is when we write in very high level languages, we are at risk of becoming complacent with the speed of the code we write. If you look at Python code for some algorithm, there are few ways to creatively speed up that code. There is a completely different attitude in C or C++ programmers, however. There always seems as if there is something you can do to give your code a little more punch. Take template meta-programming for instance in C++. You can define a template to be used to do some non-constant time calculations at runtime in constant time, so for instance you can calculate a factorial in constant time at run-time. This could potentially be very powerful, and it is completely lost on Python programmers (and programmers of many other high level languages)!

To be fair, the lack of support for features that enable much faster code allows a programmer to write much more simple and easy to read code which could save a lot of money and headaches. It really depends on what the need of the project is. I had talked to an engineer on the Bing team who said he wrote in mostly C++ because his code was extremely performance-intensive, however, one could certainly write a fast enough website in Python (Reddit, for example, is written using a lot of Python). Which brings me back to the original point of this blog post: everything is situational in software. There is no one-and-done solution for all of one class of problem. Which means that the depth of a technology stack that we need to know is highly varying. And it's always difficult to know when a system will make a specific function faster because you don't know what you don't know. I guess you could look at your application code and find potential bottlenecks and try Googling ways to fix them, but that's not a very good answer.

I guess we have to live with Yegge's comments: that there are a few major CS tenants that we all need to know, and rely on situational knowledge for the rest of the time. That's kind of okay because it will work in most cases, except when it's not and some critical performance slips through the cracks. I guess the answer is really just pay careful attention to every line of code you write, and do everything for a reason. Eventually, after months and hundreds of thousands or millions of lines of code, you can start to get a good feel for the environment you are working in. In other words, it just takes time to learn it all because writing code is hard! It's tough to learn, and even tougher to learn how to do well! I would say that the only language I've written upwards of a hundred thousand lines in is C#, and I still have a good amount to learn about .NET. The more we write, the better we get, and the better we will learn to leverage the systems we use.

Anyway, sorry for the amount of words in this post! Next time I'll break out some code and make it easier on the eyes. On the docket for this week is prep for the semester and a trip for a job interview after graduation in California (I live near Boston). This is my first time heading West, and I'm very excited! Anyways, I hope this post has been interesting for you! Leave a comment below if you have any thoughts!

Friday, December 18, 2015

Intro to Entity Framework

When I was first starting with full-stack C#, my team was giving me some hints on ways to write effective, secure, reliable code, and understandably one of the first things that was mentioned was Entity Framework. I tried some Googling and reading on-hand textbooks, and unfortunately, I was unable to find any satisfactory blog posts or tutorials on the subject of EF. Most of them were outdated or way above a beginner user's head, so here we are: at a blog post that will serve as a very basic tutorial on some of EF's more useful features, skipping over a lot of the more powerful stuff in the name of clarity.

To start, Entity Framework is a Microsoft technology that is part of ADO.NET (the data access and control tier of .NET) that allows users to abstract away the idea of a database table into an object-oriented paradigm. Basically what it does is it takes a SQL Server table and makes it a C# object that you can operate on to change the table. It allows you to generate database tables from objects or vice versa and is very powerful. In this blog post, I'm going to outline how to make a simple EF object out of an existing SQL Server table.

Let's get into it! Before that though, please note that I use Visual Studio 2013 Ultimate and SQL Server Management Studio 2014, so somethings may look different if you're using different versions, but it's largely the same. Also, please note that I will be using Entity Framework version 6.1.3.

First, let's open up SQL Server Management Studio and create a new database by right-clicking the "Databases" folder in the Object Explorer and picking "New Database".






Name it whatever you'd like and click "Ok". I named mine EFTest. Next, right click the database you just made and click "New Query".




Great! Now let's create a table with some simple SQL code. We can create a table to model some basic user data, such as their user ID, their name, and email.


1
2
3
4
5
6
7
CREATE TABLE UsersTable
(
UserID int NOT NULL PRIMARY KEY,
FirstName varchar(255),
LastName varchar(255),
Email varchar(255)
);


Pretty easy! Press F5 to execute this statement, and the UsersTable will be created in the database we just made. You can see the contents of it using:


1
SELECT * FROM UsersTable;



It may take a second or two to recognize that the table was created, but if you execute the select statement it should work anyway, or you could right click the Tables folder and hit refresh and it will show up.

Next, open up Visual Studio and create a new console application. It should generate a new main method template. 

Now we need to add EF to our project. We can do that by right-clicking our project file in the Solution Explorer window and clicking "Manage NuGet packages". EF is usually the first one on the list, but if it isn't you can search for "Entity Framework" in the search bar and it should come up. Hit install, and then NuGet should do its thing, and we should be go to go! When you're done, it should look something like this:




Once you've done that, close NuGet and right-click your project file in the Solution Explorer. Select "Add" > "New Item". In the list of templates that comes up, find "ADO.NET Entity Data Model", click it, name the file something like "UsersDataModel.edmx" and click Add.




Click "Generate from database" and hit "Next".




In the next dialogue box, click the "New Connection" button towards the top-right, and then in the new window, pick "Microsoft SQL Server", and uncheck the box if you're afraid you want to change this setting in the future. Press continue.





In the next window, enter your database's server name, which can be found in the Object Explorer window of SQL Server Management Studio. Select or enter your database name below, and test your connection. It should pass, or something has gone terribly wrong. Make sure you're using the correct authentication method (Windows Auth or SQL Server Auth, depends on what you want. I'll use Windows Auth). You can fiddle around with the Advanced Settings if you'd like, but you don't need to.




Click "OK", then click "Next". You should be at the window that says "Choose Your Database Objects and Settings". Expand the Tables option and make sure that the UsersTable is checked. You can rename the model's namespace if you'd like, but I'll leave it default. Click "Finish".




That's all you need to do to get some basic EF generated code going! Before we start using the object, let's add an entry to the users table so we can actually see the results. Go back to SQL Server Management Studio, and write a query on the database we made that is:


1
INSERT INTO UsersTable VALUES (1, 'Jim', 'Calabro', 'jamesrcalabro@gmail.com');



This will make a user with an ID of 1, first name "Jim", last name "Calabro", and email "jamesrcalabro@gmail.com". Great! You can check that it worked by running the select statement again. It should show you the contents of the UsersTable in the bottom-center box.


1
SELECT * FROM UsersTable;


Let's open up the main method and start using it. It's best practice to encapsulate any EF Entities object in a using statement because the class we generated extends the DBContext class which implements IDisposable, so instead of needing to call the Dispose() method later in the code, our object will automatically be disposed at the end of the segment. We can check that everything worked by doing a Find call. Here is the entire program:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
namespace EntityFrameworkIntro
{
    using System;

    public class EntityFrameworkIntro
    {
        public static void Main(string[] args)
        {
            using (var dbContext = new EFTestEntities())
            {
                var user = dbContext.UsersTables.Find(1);
                Console.WriteLine(
                    String.Format("User {0} {1} has email address: {2}",
                    user.FirstName, user.LastName, user.Email));
            }

            Console.ReadLine();
        }
    }
}


Which should print:




in the console.

Awesome! So to do a quick review of what we did, we made a table in SQL Server Management Studio, made an ADO.NET object (EF object) to store data about that object, and we used that object to print some information on a row in a table containing user information. It's really powerful stuff, and we're just barely scratching the surface here. Try adding a stored procedure on your own, and try calling it using the dbContext object for instance. I hope that it has become clear that Entity Framework allows for safe, maintainable data access, because if you ever want to change something about the way the db, you just need to update the generated object. As long as you don't pull the carpet out from underneath what you've already written, you're golden. I hope that this has served as a readable intro to EF! Let me know if you have any questions or comments and as usual, thanks for reading!

Monday, December 14, 2015

Inverted Binary Tree and Homebrew

What could an inverted binary tree and the OSX package manager possibly have in common? Well, this prolific tweet by Max Howell, the creator of Homebrew, hints at a narrative I've been think about quite often lately, currently being in the interview process for my first full-time position myself: how relevant are coding interviews in determining the merits of a candidate? I'm of the boat that they are a relatively fair representation of how hard the candidate is willing to study their subject matter, but first lets go over a quick solution to the inverted binary tree problem. To invert a binary tree, you want to swap the left and right children of every node. A simple solution in C# is as follows:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Definition for a binary tree node.
public class TreeNode {
    public int val;
    public TreeNode left;
    public TreeNode right;
    
    public TreeNode(int x)
    {
        val = x;
    }
}


public class Solution
{
    public TreeNode InvertTree(TreeNode root)
    {
        RecursiveInvertTree(root);
        return root;
    }
    
    private void RecursiveInvertTree(TreeNode node)
    {
        if (node != null)
        {
            TreeNode tempLeft = node.left;
            node.left = node.right;
            node.right = tempLeft;
            RecursiveInvertTree(node.left);
            RecursiveInvertTree(node.right);
        }
    }
}


That wasn't so hard! We separate the recursive part into it's own function, then just return the root of the tree, because if you invert the tree, the root won't change places. The recursive function does a check to make sure that the node isn't null, and if it succeeds, it swaps the children and recursively calls itself on the left and right children. Easy enough!

What makes this challenge even remotely hard during an interview is the interview itself. Writing code on a whiteboard is challenging! There aren't any practicing software engineers who regularly write actual source code on a whiteboard, yet we continue to do it during technical interviews all the time. I personally miss Visual Studio's Intelli-sense and tab-completion, as well as the speed I have using Vi. It takes me ages to write code on a board, and during the dead time of me writing, I occasionally lose my train of thought and have to re-adjust my bearings. It's a really tough game, and there are a lot of people that say it isn't even one worth playing.

Take Max Howell for example. He's written some genius software in his day (Homebrew is a godsend) and yet when he was asked to interview at Google, they didn't offer him a position because he didn't do well enough in his technical interview(s). It really confuses me that interviewers don't allow their candidates to code on a computer. I'm sure that if Howell was given a computer and a single Google search that he would have found the answer on StackOverflow and had it coded in about two minutes (I did not use Google or SO to come up with my solution by the way, but I'm also still in school and solving these problems regularly). Which begs the question, are interviews really extracting the information that interviewers need to make an informed decision about whether or not this person can have a positive impact on their company?

I am going to fence-sit for only a little while longer and say yes and no. It is important that software engineering know basic algorithms (honestly, inverted binary tree is one of the easiest questions you could be asked) as well as how to explain them to the interviewer. You shouldn't need a fancy debugger and SO to solve basic problems like this. There are much, much harder questions that I've been asked however, and I think that just 3 minutes reading about the problem on Google before starting would have allowed me to solve some problems I otherwise struggled on. For instance, I was asked to write a program to recursively search through a directory and all subdirectories to find files that had a phone number in them. I knew to use grep as well as regular expressions, but because I don't have all of grep's flags memorized and because I usually just Google "grep regex" anytime I want to use it, I was pretty much useless. Maybe all the interviewer wanted to see was that I knew what tools to use, but I wanted a better sense of closure for the problem.

I do, however, think that whiteboard coding gives the interviewer a sense of the candidate's technical communication skills. It's absolutely critical to talk through every line of code you put on the board to make sure that the interviewer is on the exact same page as you. Sure, it's important to get the question right, but it is also very important to make sure that the interviewer is following your train of thought. Because there is so much dead time spent writing, use it to explain what you're doing and why. If anything, that is what I've taken away from whiteboard interviews: you must be able to explain your rationale behind every line of code you write.

So what about Homebrew? Howell unfortunately didn't get an offer from Google even though he wrote one of the most excellent and important OSX applications ever. Sure, he's exaggerating that 90% of Google employees use it (I doubt that 90% of Google uses Mac, full stop, though I would guess that 99% of Google employees who use Mac use Homebrew), but should they discount a man who has already proven himself to be a very high quality engineer just because he couldn't answer a pedantic question that doesn't have much to do with anything on a whiteboard? I don't think I'm in a position to make that call. 

I do, however, think that if you're in the process of looking for a job, you should crack your data structures/algorithms textbook and get studying. Howell should've known ahead of time that he was going to be asked questions like this, and as I mentioned before, this question is a freebee. It's about how hard you're willing to study their game to ace the interview and get the job. Go on LeetCode and do 3-4 questions a day, gradually ramping up the difficulty, and when you get into the interview, you will be fine. And I understand that people are busy and don't have time for that between work, family life, school, etc. but you've really got to find the time to just do the work. It's just one of those things in life where you've got to put your head to the ground and study hard. Which is what I'm going to do right after writing this!

Is the interview process strange and alienating from what the job is actually like? Yes. Is it completely useless for determining the merit of an engineer, especially a new one? Definitely not! It's vital for showing that someone knows the basics of coding, and from whiteboard interviews, you can learn a lot about technical communication habits. Who is to say whether Google was right or wrong about not hiring Howell? It is their own prerogative. I just definitely do not want to be the guy who gets into an interview and can't solve a binary tree question, so on that note, I'm going to hit the books!

Sunday, December 13, 2015

About This Blog

Hi there! My name is Jim Calabro, and I am starting a new technical coding blog here at septacat.com! I am excited to bring my perspective on software engineering online, and I hope to post very frequently about difficult problems I've been hacking at, my thoughts on new and exciting technologies, and the state of the industry including my favorite apps, websites, companies, cultures, and more!

I'm currently a student at UMass Amherst who will be graduating in May of 2016 with a CS degree. My concentration is in software systems, and I have and am currently taking advanced courses on operating systems, statistics, computer networks, software engineering, computer architecture, discrete math, databases, and information science. I have been a member of the Minuteman Marching Band in the trombone section during my time here, played the drum set in the basketball and hockey band, and I was a member of the colonial honor guard.

I've finished two internships at EMC during which I worked on a PaaS website that delivers a scalable hard drive testing infrastructure to our manufacturing pipeline that is expected to half TCO. During my time there I also completed several smaller projects including a personnel management website, a web service mock environment, a drive conversion lookup database, and various internal system debugging tools. I learned a ton about the software engineering process (we used Agile/Scrum) as well as what it's like to work on different sized teams.

For the last year and a half I’ve been working as a multi-purpose tech consultant for the Brain, Cognition, and Development lab here at UMass. I’ve been writing code for our EEG and eye-tracking experiments, doing data and statistical analytics, suggesting, setting up, and fixing hardware configurations, and managing any other technical and computational details of the lab. My most notable contributions include preparing an implementation for a study funded by the Army, coding a dissertation for a PhD candidate, and performing statistics and data analytics on large data sets.

Some of my favorite things in tech at the moment are:
  • C#/.NET
  • Node.js
  • PostgreSQL
  • CUDA
  • Heroku
  • Azure
  • Julia

In addition to Blogger, my other online presences include:

So that's a little bit about me! If you're reading and you find something I write about interesting or you find your self in disagreement, feel free to drop me a line or comment on a post, I will be sure to respond! Thanks for reading!

-Jim

Dynamically Creating Classes that Implement Interfaces at Runtime in C#

While working at EMC a while back, some of my co-workers were finding it more and more difficult to debug their code because of the complexity of our system. I was assigned the task of implementing a WCF web service for one of our projects by manually going through implementing each interface by just returning some dummy value or type. It was not unlike unit test mocking but for web services. It had a reasonable amount of interfaces to implement and would probably have just taken a couple of hours, but instead, I asked why we couldn't do it generically, dynamically generating any WCF service with dummy Service and Data Contracts. Nobody could give me a good answer as to why not, so a co-worker and I started hacking away at it. It took some real thinking, but eventually we got it working.

By far, the most interesting part of the project was how we dynamically generated a class to implement an interface. In the name of brevity for this post, I'll focus on the most basic case: implementing an interface with only methods that return primitives (equivalently in WCF, Service Contracts with only Operation Contracts). C# finds surprising strength in it's usability and power of reflection. To briefly explain the system from a birds-eye-view, you pass in the name of the interface you wish to implement as well as the assembly you it is defined in, it takes it, and goes through implementing each method by defining a custom Method Builder that contains information on calling parameters and the return type. It then uses OpCodes to put the return value on the stack, then assigns the return value of that method to the value you just pushed. Once it has finished with all methods, the Type of the newly generated class is returned to the caller function who can instantiate a new instance of the class and handle it like any other reference type.

To begin, let's define an interface that we wish to implement dynamically. Say an ICalculator with a string method as well. It doesn't really matter what goes here as long as we have only methods that return certain primitives including strings (I will get into that later). Let's say the interface definition is as follows:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
using System;

public interface ICalculator
{
    double Add(double arg1, double arg2);

    double Subtract(double arg1, double arg2);

    double Multiply(double arg1, double arg2);

    string SomeOtherMethod(int arg1);
}


Easy enough! Notice that we're never going to make a class that implements this interface.

Next, we can define a ClassBuilder object that has only a constructor and a single public method, CreateClass. This method will take two parameters, a Type object that is the type of interface that we want to implement and an AssemblyName which is the name of the assembly that contains the interface definition. The second parameter is more there as proof that the interface doesn't need to be in the same assembly as this functionality, but we will just pass it the executing assembly and ignore it for all intents and purposes. We will need to include System.Reflection and System.Reflection.Emit. So far, we've got:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
using System;
using System.Reflection;
using System.Reflection.Emit;

public class ClassBuilder
{
    public ClassBuilder()
    {
    }

    public Type CreateClass(Type interfaceType, AssemblyName assemblyName)
    {
    }
}


Now that we've got everything we're going to need set up, we can get into the meat and bones of this project by implementing the CreateClass method. We're first going to want to define an AssemblyBuilder object for the assembly passed to this method. We can then define a ModuleBuilder to build a dynamic module. Then we define a TypeBuilder object that we will use to create a type that will implement the passed interface. You can play around with the parameters of this object, but everything we need for the TypeBuilder is listed below. We will also add an interface implementation to the TypeBuilder object to indicate that we intend to implement the interface. So now we have:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
public Type CreateClass(Type interfaceType, AssemblyName assemblyName)
{
    AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.
            DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Run);
    ModuleBuilder moduleBuilder = assemblyBuilder.
            DefineDynamicModule("dynamicModule");

    TypeBuilder typeBuilder = moduleBuilder.DefineType(
                String.Format("Autogenerated.{0}", interfaceType.ToString()),
                TypeAttributes.Public
            | TypeAttributes.Class
            | TypeAttributes.AutoClass
            | TypeAttributes.AnsiClass,
                typeof(System.Object));

    typeBuilder.AddInterfaceImplementation(interfaceType);

    return null;
}


I'm returning null at the end just for now so Visual Studio doesn't throw any compiler errors. Additionally, I apologize for the line wrapping; Blogger has a tough time with code formatting.

We've now got a TypeBuilder defined that targets the assembly we correct interface and assembly. Now, we need to iterate through each method defined in the interface and generate a dynamic implementation of it. We will do this by skipping method bodies entirely and just returning a dummy value (that's is why this process is very similar to unit test mocking). For each method the interface defines, we will create a MethodBuilder object with information about the method, including the name of the method, the protection level, if it's virtual, the return type an the parameters the method takes. To get the parameters, we will need to define a private method in our ClassBuilder class as follows:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
private Type[] GetMethodArguments(MethodInfo methodInfo)
{
    Type[] argumentArray = new Type[methodInfo.GetParameters().Length];
    for (int argumentIndex = argumentArray.Length - 1;
        argumentIndex >= 0; --argumentIndex)
    {
        argumentArray[argumentIndex] = methodInfo.
            GetParameters()[argumentIndex].ParameterType;
    }

    return argumentArray;
}


It simply iterates through each parameter and sticks it's type in an array, then returns that to the method builder. After making the MethodBuilder object, we will get the MethodBuilder's IL generator so we can use IL to create the return values of the methods and then return the Type that we're generating when we're done. We will also define a method override for the method we're currently generating. So now we're at:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
using System;
using System.Reflection;
using System.Reflection.Emit;

public class ClassBuilder
{
    public ClassBuilder()
    {
    }

    public Type CreateClass(Type interfaceType, AssemblyName assemblyName)
    {
        AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.
            DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Run);
        ModuleBuilder moduleBuilder = assemblyBuilder.
            DefineDynamicModule("dynamicModule");

        TypeBuilder typeBuilder = moduleBuilder.DefineType(
                String.Format("Autogenerated.{0}", interfaceType.ToString()),
                TypeAttributes.Public
            | TypeAttributes.Class
            | TypeAttributes.AutoClass
            | TypeAttributes.AnsiClass,
                typeof(System.Object));

        typeBuilder.AddInterfaceImplementation(interfaceType);
        foreach (MethodInfo methodInfo in interfaceType.GetMethods())
        {
            MethodBuilder methodBuilder = typeBuilder.DefineMethod(
                methodInfo.Name,
                MethodAttributes.Public | MethodAttributes.Virtual,
                methodInfo.ReturnType,
                this.GetMethodArguments(methodInfo));

            ILGenerator il = methodBuilder.GetILGenerator();
            typeBuilder.DefineMethodOverride(methodBuilder, methodInfo);
        }

        return null;
    }

    private Type[] GetMethodArguments(MethodInfo methodInfo)
    {
        Type[] argumentArray = new Type[methodInfo.GetParameters().Length];
        for (int argumentIndex = argumentArray.Length - 1;
            argumentIndex >= 0; --argumentIndex)
        {
            argumentArray[argumentIndex] = methodInfo.
                GetParameters()[argumentIndex].ParameterType;
        }

        return argumentArray;
    }
}


Great! The next step is to actually create the return value for the method. We will encapsulate this in a new private method for readability's sake:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
private void CreateReturnValue(ILGenerator il, Type parameterType)
{
    if (parameterType == typeof(void))
    {
        return;
    }
    else if (parameterType.IsPrimitive || parameterType == typeof(string))
    {
        this.EmitPrimitive(il, parameterType);
    }
    else
    {
        throw new ArgumentException("Parameter was reference value or null");
    }
}


So this method will error if you pass a reference type. It is absolutely possible to generate a dummy reference type, but it's slightly outside the scope of this post. If the method is void, nothing needs to be done, so it returns. The EmitPrimitive method is interesting however, as it basically acts as a switch on the type of argument it is passed, and if it finds it, it pushes a new instance of that primitive onto the stack which will be used as the return value. For the sake of this example, I put in some ridiculous dummy values to push onto the stack to show that it actually works.



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
private void EmitPrimitive(ILGenerator il, Type parameterType)
{
    if (parameterType == typeof(bool))
    {
        // pushes a 1 onto the stack, represents true
        il.Emit(OpCodes.Ldc_I4_1);
    }
    else if (parameterType == typeof(char))
    {
        il.Emit(OpCodes.Ldc_I4, 'a');
        il.Emit(OpCodes.Conv_I2);
    }
    else if (parameterType == typeof(int))
    {
        il.Emit(OpCodes.Ldc_I4, Convert.ToInt32(42));
    }
    else if (parameterType == typeof(long))
    {
        il.Emit(OpCodes.Ldc_I8, Convert.ToInt64(12));
    }
    else if (parameterType == typeof(double))
    {
        il.Emit(OpCodes.Ldc_R8, Convert.ToDouble(301));
    }
    else if (parameterType == typeof(string))
    {
        il.Emit(OpCodes.Ldstr, "Custom class generation worked!");
    }
    // you can add additional primitive types ad nauseum,
    // unsigned, bytes, shorts, floats, decimals, etc.
    else
    {
        throw new ArgumentException(String.Format(
            "Unsupported parameter type : {0}", parameterType));
    }
}


Almost there! Now let's add calls to CreateReturnValue in our method loop. We will also need to return. This will finish our implementation of the ClassBuilder class!



  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
using System;
using System.Reflection;
using System.Reflection.Emit;

public class ClassBuilder
{
    public ClassBuilder()
    {
    }

    public Type CreateClass(Type interfaceType, AssemblyName assemblyName)
    {
        AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.
            DefineDynamicAssembly(assemblyName, AssemblyBuilderAccess.Run);
        ModuleBuilder moduleBuilder = assemblyBuilder.
            DefineDynamicModule("dynamicModule");

        TypeBuilder typeBuilder = moduleBuilder.DefineType(
                String.Format("Autogenerated.{0}", interfaceType.ToString()),
                TypeAttributes.Public
            | TypeAttributes.Class
            | TypeAttributes.AutoClass
            | TypeAttributes.AnsiClass,
                typeof(System.Object));

        typeBuilder.AddInterfaceImplementation(interfaceType);
        foreach (MethodInfo methodInfo in interfaceType.GetMethods())
        {
            MethodBuilder methodBuilder = typeBuilder.DefineMethod(
                methodInfo.Name,
                MethodAttributes.Public | MethodAttributes.Virtual,
                methodInfo.ReturnType,
                this.GetMethodArguments(methodInfo));

            ILGenerator il = methodBuilder.GetILGenerator();
            typeBuilder.DefineMethodOverride(methodBuilder, methodInfo);
            this.CreateReturnValue(il, methodInfo.ReturnType);
            il.Emit(OpCodes.Ret);
        }

        return typeBuilder.CreateType();
    }

    private Type[] GetMethodArguments(MethodInfo methodInfo)
    {
        Type[] argumentArray = new Type[methodInfo.GetParameters().Length];
        for (int argumentIndex = argumentArray.Length - 1;
            argumentIndex >= 0; --argumentIndex)
        {
            argumentArray[argumentIndex] = methodInfo.
                GetParameters()[argumentIndex].ParameterType;
        }

        return argumentArray;
    }

    private void CreateReturnValue(ILGenerator il, Type parameterType)
    {
        if (parameterType == typeof(void))
        {
            return;
        }
        else if (parameterType.IsPrimitive || parameterType == typeof(string))
        {
            this.EmitPrimitive(il, parameterType);
        }
        else
        {
            throw new ArgumentException("Parameter was reference value or null");
        }
    }

    private void EmitPrimitive(ILGenerator il, Type parameterType)
    {
        if (parameterType == typeof(bool))
        {
            // pushes a 1 onto the stack, represents true
            il.Emit(OpCodes.Ldc_I4_1);
        }
        else if (parameterType == typeof(char))
        {
            il.Emit(OpCodes.Ldc_I4, 'a');
            il.Emit(OpCodes.Conv_I2);
        }
        else if (parameterType == typeof(int))
        {
            il.Emit(OpCodes.Ldc_I4, Convert.ToInt32(42));
        }
        else if (parameterType == typeof(long))
        {
            il.Emit(OpCodes.Ldc_I8, Convert.ToInt64(12));
        }
        else if (parameterType == typeof(double))
        {
            il.Emit(OpCodes.Ldc_R8, Convert.ToDouble(301));
        }
        else if (parameterType == typeof(string))
        {
            il.Emit(OpCodes.Ldstr, "Custom class generation worked!");
        }
        // you can add additional primitive types ad nauseum,
        // unsigned, bytes, shorts, floats, decimals, etc.
        else
        {
            throw new ArgumentException(String.Format(
                "Unsupported parameter type : {0}", parameterType));
        }
    }
}


Now let's call if from Main to prove it works:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
using System;
using System.Collections.Generic;
using System.Reflection;

public class MainClass
{
    public static void Main(string[] args)
    {
        ClassBuilder classBuilder = new ClassBuilder();
        Type dynamicType = classBuilder.CreateClass(typeof(ICalculator),
            Assembly.GetEntryAssembly().GetName());
        var dynamicObject = Activator.CreateInstance(dynamicType) as ICalculator;
        Console.WriteLine(dynamicObject.Add(1.0, 99.0));
        Console.WriteLine(dynamicObject.Subtract(1.0, 99.0));
        Console.WriteLine(dynamicObject.Multiply(1.0, 99.0));
        Console.WriteLine(dynamicObject.SomeOtherMethod(1001));

        Console.WriteLine("\nDone");
        Console.ReadKey();
    }
}


Which will produce:



1
2
3
4
5
6
301
301
301
Custom class generation works!

Done


And that's it! We've now got a way to generate a dynamic implementation of any method-only interface that returns only primitives! It's not super useful yet, but with a little more work to make it more versatile, it actually ends up being a really interesting project in reflection and a great programming exercise. A common use case for this could be if you're waiting on an implantation of an interface from a co-worker, you could use their interface in a mocked environment without actually needing their implementation. I think that's pretty interesting, as it breaks long-held programming workflows, especially in projects where someone's implementation gets pushed back for extended periods of time. You could define your usage of their interface and simply plug their implementation in afterwards. It is particularly useful within the context of WCF as you don't need an implementation to launch a web service. You can go ahead and use these custom-generated classes in their place. This project has several applications that are edge-case scenarios, however it provides a very interesting and versatile solution.