Sunday, July 26, 2009

Pet Programmer Peeve

I'm currently working on a new blog, and this one will be deprecated at some point in the future. I'll link to the new one when the time comes. However, I felt I had to stop in to point out one of my greatest C/C++ pet peeves.

When a programmer writing a C++ program wants to declare a reference to an integer, he will often write:

int& myInt = i;

I am here today to tell you that this practice is wrong and shows that the programmer does not fully understand the C/C++ type system.

Let's start with a simpler situation, declaring a pointer to an integer. This same programmer would no doubt write:

int* myInt = &i;

This is no less wrong. The principle behind the syntax of C declarations is that “the syntax is an attempt to make the declaration and the use agree” (K&R, p 122), or more popularly, “declaration mimics use”. This means that, after the type on the left, the declaration expression will look like the variable in use.

To dereference a pointer (get the value to which it points), the programmer writes *myInt, as in:

int j = *myInt;

The * is the dereference operator. It is conceptually the same operator that appears in the declaration of a pointer. It operates on the variable name, not the type. Therefore, the proper declaration is:

int *myInt = &i;

With the * attached to the variable. To make the point, it is equally valid to write:

int (*myInt) = &i;

But it is illegal to write:

(int*) myInt = &i; // Syntax error!

While C++ deviates from the “declaration mimics use” ideology, the & operator for declaring references operates on variable names, just like the * operator. The declaration for a reference is:

int &myInt = i;

Now, you may say that this is just aesthetics, that this isn't a practical thing to worry about. However, the practice of putting the operator with the type is confusing for new programmers. The programmer from earlier may believe that, to create two references, he can write:

int& myInt = i, myOtherInt = j;

This is not a syntax error. Unfortunately, it does not behave as he expects; it declares one reference and one integer because there is no & operator before the second variable. It is parsed as:

int (&myInt) = i, myOtherInt = j;

What is worse is that this is a silent error. Because integers and references can be used interchangeably in many places, the novice programmer may not know what is wrong without extensive debugging.

Labels: , , , ,

Tuesday, January 29, 2008

Currying in JavaScript: Fun for the Whole Family!

Files referenced: curry.js 2kb

As I've said before, I'm quite partial to JavaScript. Lately, I've been reading about currying functions, so let's see if currying can be done in JavaScript.

The begs the question, What is currying? Currying is a technique to transform a function that takes some number of arguments to a function that takes fewer. The process was named after the logician Haskell Curry. The goal is to get something that works like this:

function add(a, b)
{
return a + b;
}

alert(add(3, 4)); // 7

var add_3 = add(3);

alert(typeof add_3); // function
alert(add_3(4)); // 7

When we pass fewer than the required number of arguments, it returns a function that takes the remaining arguments. Once we pass the correct number of parameters, it evaluates the function.

Before possibly reinventing the wheel, I looked online to see if anyone had done this. While there are plenty of JavaScript functions that claim to curry, they all get it wrong. They do something like this (using the same add from above):

var add_3 = curry(add, 3);

alert(add_3(4)); // 7
alert(typeof add(3)); // number

They do get the add_3 function in the end, but all they do is create a new function with fewer arguments. Any calls to add will still behave like any other JavaScript function with too few arguments; the remaining arguments will be undefined. The add_3 function (both times) is called a partial function.

So how do we do proper currying in JavaScript? Here's what I did. It takes three steps.

First, we need a way to change a function's arity. Arity is the number of parameters a function accepts. This may not seem directly relevant, but trust me:

Function.prototype.toArity = function (n)
{
var func = this;
var parmString = '';
var funcString = '';
var i;

if (n == func.length)
{
return func;
}

if (n == 0)
{
return function ()
{
return func.apply(this, arguments);
};
}

for (i = 0; i < n; ++i)
{
parmString += 'a' + i;

if (i < n - 1)
{
parmString += ',';
}
}

funcString = '(function (' + parmString + ') { return func.apply(this, arguments); })';
return eval(funcString);
};

This function looks complicated, but it's not really. The basic idea is to wrap the function call in another function of the desired arity. They only way to do this in JavaScript is to create the function at runtime with eval. I know most people (myself included) advocate against eval, but in this case it is truly the only way. So this function will convert something like this:

function iTake3Args(a, b, c)
{
return a + b; // Ignore c
}

via this call:

iTake3Args.toArity(2);

to something like this:

function anonymous(a0, a1)
{
return func.apply(this, arguments);
}

It may look ugly, but it works. And no-one should be printing the internals of the function anyway (they'll be calling it).

So now that we can control the arity of functions, we can generate partial functions (what all the libraries do) correctly (with the property arity). The function goes like this:

Function.prototype.partial = function ()
{
var partialArgs = Array.prototype.slice.call(arguments, 0);
var func = this;

var retFunc = function ()
{
var args = Array.prototype.slice.call(arguments, 0);
return func.apply(this, partialArgs.concat(args));
};

return retFunc.toArity(Math.max(0, func.length - partialArgs.length));
};

There's quite a bit going on here. Why is the arguments object being passed to slice? The arguments object is just that, an object. We need to convert it to a proper array so we can concatenate the two argument arrays in the partial function. Then the return function (with arity 0) is converted to the proper arity. The function works similarly to the other curry functions:

var add_3 = add.partial(3);

alert(add_3(4)); // 7

Now we can finally get to currying properly. Here's the function:

Function.prototype.toCurriable = function ()
{
var func = this;

var retFunc = function ()
{
if (arguments.length < func.length)
{
return func.partial.apply(func, arguments).toCurriable();
}

return func.apply(this, arguments);
};

return retFunc.toArity(func.length);
};

If you've understood everything so far, this shouldn't be too out there. If there aren't enough arguments, a partial function will be returned. Otherwise, the function is evaluated. The return function (with arity 0) is converted to the proper arity. So finally, we can recreate our initial example (with a couple changes):

var add = function (a, b)
{
return a + b;
}.toCurriable();

alert(add(3, 4)); // 7

var add_3 = add(3);

alert(typeof add_3); // function
alert(add_3(4)); // 7

Note the change to the definition of add. It has to be a function expression rather than a function definition because we want the result of toCurriable, not the original function.

This is by no means a perfect solution. While it preserves arity, it does not preserve scope. Things like:

someObj.func(3)(4);

won't work as expected, because the second function call is evaluated in global scope rather than on someObj. I don't see any way to fix this with my current functions.

Also, this solution relies on closures and eval, which means it will be memory intensive and will be slow in some browsers. I have not tested this in any browsers, only in Mozilla's Rhino, a command-line JavaScript environment. I imagine it will work in any Mozilla product. I make no claims about Internet Explorer.

I do, however, feel that this is an elegant and unobtrusive solution that has not been done before. This was a purely theoretical endeavor. True currying is definitely possible in JavaScript.

I hope someone found this useful. You can download all of the above functions in one JavaScript file at the top of this entry.

Labels: ,

Friday, May 04, 2007

A-Maze-Ing

Today was my last computer science lab of the semester, so I thought I'd share this problem we discussed...

The Problem:


Given a grid of empty and filled spaces, find the number of shortest paths from start to finish.

Example 1:


S 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 F


S is start, F is finish, and 0 is an empty space. Programmatically, we can determine that the answer is 84 paths. Now we just need a mathematical solution.

The length of a shortest path is 9 steps, for instance, 4 steps down and then 5 steps right. So, given 9 steps, at each step we have the option of moving vertically or horizontally, but we have to move 4 horizontal steps total. That sounds like a Combination!

Paths(S, F) = 9C3 = 84

And there we get our answer.

Example 2:


S 0 0 0 0 0 0
0 0 0 X 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 F

S is start, F is finish, 0 is an empty space, and X is a filled space. Programmatically, the answer is 54. Let's try combinations again.

Paths(S, F) = 9C3 = 84
Paths(S, X) = 3C1 = 3

Hmm... that doesn't help much. Or does it? 84 – 54 = 30. But where do we get a 10? On a guess, I tried:

Paths(X, F) = 5C2 = 10

Wow! Now we have the theory:

Number of Paths = Paths(S, F) – (Paths(S, X) · Paths(X, F))
Number of Paths = 84 – (3 · 10)
Number of Paths = 54

Does this work in other cases? Let's try it out.

Example 3


S 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 Y 0 0 0 0
0 0 0 0 0 0 F

Again, S is start, F is finish, 0 is an empty space, and now Y the a filled space. The answer is still 54.

Paths(S, F) = 9C3 = 84
Paths(S, Y) = 4C2 = 6
Paths(S, Y) = 5C1 = 5

Number of Paths = Paths(S, F) – (Paths(S, Y) · Paths(Y, F))
Number of Paths = 84 – (6 · 5)
Number of Paths = 54

Looks like we have a winner!

Example 4:


S 0 0 0 0 0 0
0 0 0 X 0 0 0
0 0 Y 0 0 0 0
0 0 0 0 0 0 F

In our final example, S is start, F is finish, 0 is an empty space, and this time both X and Y are filled spaces. The answer is 24.

This example combines both of the previous examples, so perhaps we can apply our theory twice:

Paths(S, F) = 9C3 = 84

Paths(S, X) = 3C1 = 3
Paths(X, F) = 5C2 = 10

Paths(S, Y) = 4C2 = 6
Paths(S, Y) = 5C1 = 5

Number of Paths = Paths(S, F) – (Paths(S, X) · Paths(X, F)) – (Paths(S, Y) · Paths(Y, F))
Number of Paths = 84 – (3 · 10) – (6 · 5)
Number of Paths = 24

It appears that this solution works, but I haven't tested it in any other situations yet. For now, this is my answer, and I'm sticking to it. Chris out!

Edit 2007-05-05


After thinking about it, I figured out why we multiply the two path counts. For every possible path from start to the filled square, there are a certain number of paths from the filled square to finish. All of those whole paths are no good. For example...

S 0 0 0 0 0 0 S 0 0 0 0 0 0 S 0 0 0 0 0 0
1 2 X 3 4 5 0 1 2 X 3 0 0 0 1 2 X 0 0 0 0
0 0 0 0 0 6 0 0 0 0 4 0 0 0 0 0 3 4 5 6 0
0 0 0 0 0 7 F 0 0 0 5 6 7 F 0 0 0 0 0 7 F

S 1 2 0 0 0 0 S 1 2 0 0 0 0 S 1 2 0 0 0 0
0 0 X 3 4 5 0 0 0 X 3 0 0 0 0 0 X 0 0 0 0
0 0 0 0 0 6 0 0 0 0 4 0 0 0 0 0 3 4 5 6 0
0 0 0 0 0 7 F 0 0 0 5 6 7 F 0 0 0 0 0 7 F

In the above grids, grids in the same row have the same path from start to the filled square, and different paths from the filled square to finish, and the opposite for columns. This is not all possible paths, just some examples. The number of starting paths multiplied by the number of ending paths is the total number of paths through the filled square, all of which are invalid paths. Therefore, this number is subtracted from the total number of paths to get the answer.

Edit 2007-05-07


A bit more experimentation, and I've come up with a general solution for cases where the filled spaces are only increasing; that is, filled spaces are only below and/or to the right of the previous filled space. For those with MathML enabled browsers (I think most are), you can view that solution here. For those who can't view the MathML, why not try Mozilla Firefox?

Labels: ,

Monday, April 23, 2007

JavaScript is powerful, people. Believe me.

There are many people who still believe that JavaScript is somehow an inferior programming language, and that Java or C++ is somehow better. Today I shall attempt to put some of these beliefs to rest.

First a little history. JavaScript started in the Netscape browser as a new feature to combat the competing Internet Explorer (we can see how well that worked out). It was originally called Mocha, but Netscape decided to change the name to JavaScript after a deal with Sun to include Java technology in the browser. Of course, Microsoft had to have their own JavaScript, and called their JScript to avoid potential trademark concerns. As of this blog, the latest JavaScript is 1.7.

JavaScript and Java are, for the most part, only related by name. Their similar syntax is because both are derived from the C programming language (about which I have previously commented). While Java is a statically typed language (each variable has a specific type), JavaScript is dynamically typed (variables' types are determined by context). It is this superficial similarity that leads to our first misconception.

1. Doesn't JavaScript lack support for Object Orientation?


Many programmers who are not practiced in JavaScript will say that JavaScript does not support object oriented programming because it does not support classes. This belief is, of course, wrong. It is true that JavaScript does not use the traditional Class-based paradigm used by C++ and Java, in which objects are instances of classes, and new classes can be built by inheriting from other classes. Instead, it follows the Prototype-based paradigm, in which new objects can be built by cloning an existing object or ex nihilo (“from scratch”).

In JavaScript, a new object can be created in multiple ways. The first is through an object literal, such as:

var myObj = { property: "someValue" };

The second is through a constructor function, like the following:

function MyType(a)
{
    this.property = a;
}

var myObj = new MyType("someValue");


Both create a new object with one property, whose value is the string "someValue". The second is more like the C++ and Java style of object creation and simulate a class.

JavaScript (and all Prototype-based language) also provide a mechanism for inheritance. This is where the prototype comes in to play. Just what is the prototype? It is an object upon which other objects are built. For example:

function BaseType()
{
    this.funcA = function()
    {
        alert("Base::funcA");
    };

    this.funcB = function()
    {
        alert("Base::funcB");
    };
}

function DerivedType()
{
    this.funcB = function()
    {
        alert("Derived::funcB");
    };
}

DerivedType.prototype = new BaseType();

var myDeriv = new DerivedType();
myDeriv.funcA();
myDeriv.funcB();


The above will display the following:

Base::funcA
Derived::funcB


Every DerivedType starts out with all the functions of a BaseType, but the constructor then overrides the property funcB with its own version. This is inheritance.

2. Isn't JavaScript only used in web pages?


This one is pretty easy to refute. JavaScript is used in the core of the Mozilla Firefox browser, and when writing extensions one uses JavaScript. There is even a command line version of the JavaScript engine, so that JavaScript shell scripts can be made.

3. Doesn't JavaScript not always work? Isn't it unstandardized?


The official standard for JavaScript is called ECMAScript. If there is a standard, why does JavaScript work in some browsers and not in others? The question answers itself: not all browsers implement the same JavaScript. They all implement ECMAScript, but this is the barest form of JavaScript. The parts that usually vary are the DOM and the event model.

The DOM (Document Object Model) is a way of expressing XML documents (usually the HTML page) as a collection of objects in parent-child-sibling relationships. The W3C has recommended a standard DOM, defining objects and their functions. Most browsers follow this recommendation in their implementation of ECMAScript, but some (read: MS Internet Explorer) do not. This makes writing cross-browser scripts that use the DOM difficult, and has led many to think that JavaScript is unreliable.

var xyz = document.getElementByID('xyz');

The same code in JScript:

var xyz = document.all['xyz'];

Another issue is the event model. Again, there is a recommended standard from the W3C for how events should work. And again, some browsers (guess who!) have decided not to use the standard.

JavaScript:
function listener(e)
{
    alert(e.srcElement);
}
xyz.addEventListener('onclick', listener, false);


The above in JScript:

function listener()
{
    alert(window.event.srcElement);
}
xyz.attachEvent('onclick', listener);


There are a few significant differences. First, notice that in the W3C standard, the listener function takes the event object as a parameter, but the JScript version has a global event object. Secondly, notice that the implementations use different functions with different numbers of parameters to add an event to an object.

Act III: Epilogue


I hope we've learned something from this rant. JavaScript is not to be taken lightly; it can be used to do anything another programming language can do, because it is what is a Turing-complete language. JavaScript is sometimes difficult to use in web pages because different browsers use different implementations of the ECMAScript standard, but most browsers use compatible language features, but Internet Explorer does not.

Labels: , ,

Sunday, April 15, 2007

Why do I dislike C++?

Let me count the ways...

1. Overloading Operators


Let's suppose we have the following:

std::vector<int> stack;
vector<int>::const_pointer iter = stack.begin();


This instantiates a vector of integers and an iterator for that vector. So far so good.

To access the element at which iter is currently looking, we type:

int b = *iter;

In other words, it looks as though we dereference iter. So it would follow that we can code:

int * ptr = iter;

WRONG! iter is not actually a pointer to anything, but rather a class that has cleverly overloaded the * operator to return the object at which it is currently looking. So how do we get a pointer to the current integer? As far as I can tell, like this:

int * ptr = &(*iter);

That's right, we use the & and * operators, usually inverses of each other, to get our pointer. Thank you, Standard Template Library.

2. Constants


C originally lacked constants; they were not added until shortly before the creation of C++ and therefore needed to be built around the existing language. Although constants in C++ generally are not a problem, there is one case that bothers me.

const int * a;
int * const b;


These are not equivalent statements. a is now a mutable pointer to a constant integer, while b is a constant pointer to a mutable integer. It gets better.

const int * const c;

This declares c to be a constant pointer to a constant integer. Surely there could have been a better way to do this — it seems very clunky to me to use the same keyword twice in the same statement.

3. C-Strings


This one is really a problem with C, but it was inherited by C++.

Back in the good (bad?) old days of the '60s, the next generation of programming languages was emerging. FORTRAN, COBOL, and BASIC all have a string type, which holds a series of characters and can have functions called on it to determine its length, the position of substrings, and so on.

But not C.

The makers of C decided that a string type would be superfluous. How did they decide to implement the string concept? A simple array of one-letter chars. It would end with a null character (the character whose ASCII value is 0, represented as '\0' in C). In C, arrays are extremely simple. They are really only pointers, and as primitive types they cannot store extra information such as their length.

How will we know how long our strings are, the developers must have asked themselves. We will count them, they answered themselves, or keep track of it in another variable. How will we call functions on strings, they further wondered. We will pass the function a pointer to the string and an integer for the length. How will we return strings? We won't; we'll modify their values as side effects of the functions. Won't that make otherwise simple operations like concatenation or comparison difficult and ugly? Who cares, this is C.

Years later C++ was created, and with it came a string class via the Standard Template Library. However, the remnants of c-strings still linger. String literals are first interpreted as c-strings, then converted to strings if needed. The getline member function of an input stream expects a c-string; there needs to be a special non-member function to read in to a string object.

4. Switch


The switch statement in C++ reeks of the days of assembly; it is essentially a glorified GOTO.

switch (someVar)
{
    case VALUE_1:
        // Do something

    case VALUE_2:
        // Do something else
        break;
}


What is wrong with this picture? Any programmer can tell me that I forgot the break in the first case. If someVar is indeed equal to VALUE_1, the program will do something, then it will continue right on and do something else. It seems odd for C++ to rely on a statement to tell it where the code in a block ends; C++ normally uses curly braces for that. We must examine this more closely.

100 IF someVar = VALUE_1 THEN GOTO 500
200 IF someVar = VALUE_2 THEN GOTO 700

500 REM Do something
700 REM Do something else
800 GOTO 1000

1000 END


The above is our switch program written in BASIC. We can see the behavior much more plainly here. The switch is like a series of IF statements that then GOTO the proper place in the program. The cases are labels. The break is also translated into a GOTO. If the break in C++ or the GOTO in BASIC is forgotten, the program just keeps on going. There are no actual blocks, only a series of labels.

For the record, BASIC does have a switch construct too, called SELECT. It looks and behaves just like its C++ cousin.

Almost half a decade later, we have decided that GOTO is probably not the best idea anymore. Very few, if any, modern languages make use of it (Java keeps it as a reserved word but has never used it).

It gets better though. Because there are no blocks in a switch statement, there is only one scope.

switch (a)
{
    case 1:
        int i = 3;
        break;

    case 2:
        int i = 4;
        break;
}


This code would produce an error, because i is declared twice in the scope of this switch. One solution might be:

switch (a)
{
    case 1:
        {
            int i = 3;
        }
        break;

    case 2:
        {
            int i = 4;
        }
        break;
}


Here we force each i to have its own scope. At this point, though, we have triple indentation. It might have been easier to just use a series of if statements, as they would each have their own scope.

These are some of the gripes that I've had with C++ so far as an undergraduate computer scientist. It was quite good to vent, and now I'll be able to refer people to this blog when I say that I dislike C++. I'll probably wind up returning to this entry later, once I've had time to discover more of C++'s more, er, fascinating idiosyncrasies.

Labels: , ,