jonchu's posterous

Pearls of wisdom from ... me (haha)

Why Are Generics so Broken?

I've recently started as a full time engineer at Palantir
Technologies, which primarily uses Java 1.5 as their language of
choice.

In my first few weeks coming back to Java since the 1.4 days, I
noticed a few things about Java's new feature: generics.

At first glance, generics seem like a wonderful idea. Gone are the
days of manually casting your objects. Type-safety is now an utmost
priority and errors can be caught at compile time. Java is learning
from academia! What more can you ask?

Well, I hate to break it to you, but generics are very broken.

For one thing, if i try to create a generic array, I CAN'T! I still
need to cast. At a deeper level, this makes perfect sense. If I try to
do something like:
     Point[] arr = new Point[DEFAULT_SIZE];

the compiler has no idea what type Point really represents. Thus, it
can't instantiate an array of them in memory.

Syntactically, however, this makes absolutely no sense. Everything
else in generics is instantiated this way except arrays. Any sane
person would expect arrays to work the same way.

Even worse, generics is meant to ensure type safety. But it doesn't
always do so! If a developer naively assumes that generics guarantees
them type safety, they can easily get burned by runtime bugs that they
assume can only occur at compile time!

Take this piece of code for example:
     Map<Integer, Integer> map = new HashMap<Integer, Integer>();
     map.put(new Long(5), new Point(x,y));

What do you expect to happen? Well! We have wonderful generics right?
You're putting a Long in as a key for map that defined its keys as
Strings. This should give us a compile time error.

Wrong! This actually compiles just fine. Why? The method signature of
put actually takes Object. And imagine if you had called something
like map.put(factory.getKey(), point), whereas you meant
map.put(factory.getStringKey(), point). That would be hard to find.

Who the hell decided this was a great idea? Yes yes, I know, backwards
compatibility issues blah blah.

If you're going to promise type safety, don't half ass it. Either
fully implement type safety, or don't do it at all. We all know that
the best user interfaces are intuitive and do what's expected. The
same goes for languages! If a language feature doesn't do what it
obviously should, it's a pretty broken feature.

Yes, generics is a feature that we've gotten used to and live with.
But it is good to take a step back and realize that this is an
extremely hacked feature. It's broken and it sucks. Once you come to
terms with this realization, you'll start looking for those hard to
find bugs you shouldn't have to...because Java didn't catch it at
compile time.

Posted June 21, 2011

Really Google? Grow up and Quit Acting Like a 3 Year Old

Google, I think I've lost more respect for you in the last year than I
have for any other company. You used to not be evil. You used to win
battles by making better products.

In a single year, much has changed. Now, you're just fighting your
competitors by throwing temper tantrums.

Namely, this whole fiasco with Bing "copying" your search results is
utter bullshit. We both know this. Maybe its because I've worked in
search before that this is painfully obvious to me. Let's take a look
at the facts again.

1. Search is an extremely difficult problem
2. Most algorithms that deal with search are machine learning related
3. Machine learning requires previous data to be remotely useful
4. Click data is a VALID form of bias that you can introduce into a
machine learning algorithm to make it more accurate

Now, let's say you're Bing. You're pouring your heart and soul into
making the best search engine possible. Where to start? Well, seeing
as we're a part of Microsoft, we have millions of users using IE8.
Here's an INNOVATIVE thing that we can try that no one else has the
ability to try. Instead of using PageRank (or an updated version of
it), let's develop an algorithm that takes, as input, user click data,
how long the users spend on the site, what sites they click on, and
what they click on when their on the site, and see if we can find some
relation between what users do on sites and how relevant that is to a
search term. Realize, the rankings are COMPLETELY ALGORITHMIC. And I'm
100% positive that Google is fully aware of this (if not we should all
stop using Google Search).

Not to mention, the "sting" operation performed by Google proves
absolutely nothing. So you made up some random words using IE8 and
Microsoft "copied" your search results. Wait, why are you using IE8?
Oh wait! So you mean you figured out how Microsoft's algorithm works
and you just gamed their system so you could frame them for cheating!
Well, in that case, if I recall correctly, your algorithm uses
keywords. I recently made a website with a random new keyword in it.
You indexed it. HEY!!! You just copied my results. My little search
engine returns my site too when you type in that word.

Additionally, didn't this "sting" only work for 8% of the words?
That's a ridiculously low number. If they really were purposefully
copying your results, they would have around a 90+% copy rate.
Microsoft's got some intelligent guys. If they were trying, they'd
probably score higher than an 8% on your exam. I mean, that's more
than just failing. Last time I checked, an F was 60% or less. Or maybe
I'm just naive and Google heavily curves everything. Prospective
Google employees, don't worry guys, if you get 8% of your interview
correct, you still get an A.

Finally, doesn't this sting actually prove that Bing is working better
than ever. I mean, a day before your sting, the words you made up
didn't exist at all. Now, because you made them link to these pages,
these terms are now infinitely more relevant than they were 1 day ago.
Thus, the "right" thing to do for a search engine would have been to
link to these pages anyway. They're the most relevant pages for the
term.

Google, basically, all I've gotten out of your blog post is that
you're a very immature company that, for the first time in its
existence, is facing some real competition. So instead of competing
head on, you're going to go cry about. Go cry then. I mean, don't get
me wrong. I hate Microsoft's products. Google is infinitely better
than Bing. I use openoffice over Microsoft office. I avoid .Net like
the plague, and you'll never catch me running windows (Linux ftw!).
But this is just ridiculous.

Google, grow up. You've cried me a river, now build a bridge to get over it.

Occam's Forgotten Razor

Occam's razor is a well known heuristic that is a core foundation of
the fields of cognitive psychology, artificial intelligence, and
machine learning. It simply states that "the simplest explanation is
usually the correct one." You're probably reading this and thinking to
yourself that Occam just stated the obvious and got lucky by having it
named after him. However, if Occam's razor is so obvious, why then do
people constantly ignore it?

How likely is it that a conspiracy theory is true?
Why are there so few women in CS?
How come investors won't invest in my startup?
Why does Facebook keep ignoring my privacy concerns?
Why does my code segfault?

I've seen the most ridiculous answers to each of these questions. And
quite frankly, I'm hard pressed to believe most of them.

I once TAed for a Turing award winner who explained randomized
algorithms in the following manner. There's a chance that you'll hit
your worst case in a randomized algorithm, but its so unlikely that we
just don't care. Similarly, there's a chance that all the air in the
room will move into the upper right hand corner and everyone will suffocate
and die. Do we live under the constant fear of spontaneous suffocation?

In the same way, ridiculous theories do have a probability of being
correct. Is the probability high enough for them to be worth
considering?

But of course, the most likely explanation for my segfault is that
there's a race condition in pthread.h. There's no way its my code
thats incorrect. I can't see a mistake in my code, therefore it does
not exist. The threading library that's been heavily tested by
thousands of users must be at fault...

Analyzing Diaspora

There's a new social network in town: Diaspora. And it may
revolutionize social networking by giving each individual complete
control. The founders seem excited about their idea and are very
genuine when explaining why they believe their product is necessary.
But the project is bound to fail and is currently running on hype.
The team does not seem to understand the immense challenges present
with starting a project of this scale. They additionally do not
realize that having attended a University doesn't automatically give
them the skills necessary to execute this project and they also refuse
to face the fundamental flaws in their idea.

Taking an in depth look at Diaspora's video pitches leads me to
question their understanding of the task they're about to undertake.
Their pitch video is endearing because the four founders of Diaspora
are candid and believe in their project. Although it's extremely
obvious that parts of their pitch are scripted, it adds to the
believability of their pitch because they are obviously not actors.
But sometimes it becomes painfully obvious when they aren't acting,
but aren't helping their cause.

Raphael states that we should all trust the Diaspora team because
they're willing to devote three months of their time to the project.
I'm not sure how naive Raphael is, but it will take much longer than
three months of their time to mold Diaspora into a true product. Ilya
seems to have no idea how much work this project will take because he
happily chirps that everyone can be assured they'll succeed. After
all, they're willing to put in up to twelve hours a day. Last time I
checked, twelve hours a day wasn't nearly enough to succeed in a tier
I University, much less topple the most influential startup since
Google. The only person on this team who seems to have any idea what
undertaking this project entails is Dan, who quietly mumbles "I think
it'll be a little more than 12 hours."

There's also the issue that these guys don't have the skills to make
this project happen. In their most recent blog post, they mentioned
that they've been working with Pivotal Labs. From browsing Pivotal's
website, I've found that Pivotal Labs is a company that takes ideas
and implements them. As in every company, Pivotal doesn't work for
free. Thus, the most likely explanation for Diaspora's involvement
with Pivotal is that the Diaspora team has taken the 200k they raised
from Kickstarter and outsourced their coding to Pivotal. This will
definitely help get their product off the ground, but Pivotal won't be
around forever. Does the Diaspora team have the ability to maintain
the code that Pivotal creates? I'm completely unconvinced.

This leads to one of the most dangerous scenarios a startup can have:
four founders who may not be the greatest hackers and are completely
ignorant about where their ability lies coupled with a codebase that
they didn't write. And which dominant company are they trying to
topple? Facebook. A CEO that's proven that he understands technology.
A team where every single engineer is brilliant. A company where every
single person including sales, design, and business employees, can
code. What about Diaspora? I'm not even convinced their founders can
code. How then, can they convince me to trust their code (and I use
the term "their" loosely)?

Finally, the concept behind Diaspora has some fundamental flaws that
doom it to failure. The plan is to release software that will allow
you to create your own node on a server that can connect to other
nodes and grab profile pages from those nodes. Wait though? Doesn't
this sound familiar? A cluster of computers that serve pages and other
computers retrieving pages... This sounds exactly like the Internet!
And just like the Internet, to be a part of Diaspora's social network,
you need to have some knowledge in system administration and own your
own server. If you can't set up a server and administer it yourself,
you can't join in on the fun, and how many people in the general
population really know how or even want to maintain their own server?
But wait! Diaspora's business model handles this. For a fee Diaspora
will lease and administer a server and install a Diaspora node for
you. But how many users are really willing to pay Diaspora to be a
part of their social network. Especially when almost all the
alternatives are free? Yes, you are paying to keep your data safe, so
you are paying for a service. But even that thread of logic quickly
unravels. If you host your information on Diaspora's servers, Diaspora
would then have all YOUR data on THEIR servers. How then is this any
better than just using Facebook? With such inherent flaws in the
project's main goal, its surprising that the project even got any hype
at all.

In short, the Diaspora project has been successful to date only due
to the privacy issues surrounding Facebook, which generated hype. The
majority of its founders have not shown much promise, and the team
itself still hasn't given us anything beyond empty words and an
outsourced project. Sure, they may be able to get by on hype alone.
Sure, they may garner some success. Regardless, in the end, this
project will likely fail. The founders appear to be extremely naive,
the team fails to show that they have the competence to write
trustworthy code, and the idea is fatally flawed. All in all, the
seeds of Diaspora will never spread beyond the bubble of its current
fans who refuse to see logic because they have been blinded by hope.

Posted July 2, 2010

Facebook: The Reverse Paypal Mafia

With the announcement today that Facebook has hired Matthew Papakipos,
Facebook has begun to assemble a killer team. Currently on staff, they
have Paul Bucheit, Blake Ross, Joe Hewitt, and Bret Taylor. Of todays
hot products, they have grabbed the creators of GMail, Adsense,
Friendfeed, and Firefox.

Unlike Paypal where its former employees consistently create
successful new companies, Facebook has been siphoning talent from the
technology world like a black hole. It seems, however, that Facebook's
ability to attract top talent is due more to its "hip factor" than
anything else. Facebook has remained a hot item for a long time, and
thus many would consider it an ideal place to work. It remains to be
seen, however, if Facebook has the ability to leverage their newfound
talent to continue expanding. The company needs to prove that it can
not only be cool like Google, but execute and profit like Microsoft.

As it stands now, the company is the master of social networking. No
other company even comes close to its dominance, and Facebook has an
extremely solid product. However, they've begun to saturate the market
in terms of social networking. There are only so many users in the
world, and when you have a certain percentage of market share, it
becomes nearly impossible to expand.

How then can the company continue growing? If we compare the company
to Google when it was still in the throes of its younger years, we see
a striking resemblance. Google overtook the search industry in a very
similar fashion by which Facebook overtook the social networking
industry: by storm. But the search industry is only part of what made
Google the giant power they are today. Currently, Google continues to
experiment, pushing new products, trying new ideas, and even coming up
with new technology in their very secretive and relatively unknown
research division. If Facebook wants to become the next Google, they
must continue expanding and creating new products. This doesn't mean
they should stop iterating on their core product: social networking.
This means that they should try new things. And with rumors swirling
about that Facebook is rolling out a new email service as well as a
new search service, it seems like the company is doing exactly what
they should be--finding new markets.

These few rumors may or may not be true. However, we can all be sure
of one thing. With so much talent, Facebook has no excuse for not
being able to execute. They've proven they can do one thing extremely
well. Look to the horizon to see what they dominate next.

Posted June 28, 2010

The Corporate Mind

Tech startups always start hungry. They begin with an ambitious few
coders. These founders work day and night and are hungry for success.
The startup moves quickly and the founders make fast decisions,
quickly iterating through new features and releases within a matter of
days. They dedicate their entire beings to making their baby
successful and yearn for the success of their startup as a parent
yearns for the success of its child.

However, there is a stark contrast between how a startup operates and
how a large corporation operates. Large Corporations move extremely
slowly. They take weeks or even years to make major decisions and work
slowly but carefully to ensure that their product will not anger the
already existing customers. The cogs of the corporation are must sift
through bureaucracy, politics, and meetings at every turn and the
overall company becomes a giant sluggish machine.

We realize, though, that every large corporation was once a startup.
When it was young, its limber limbs and young heart drove it to
perform, but in its success and old age, these corporations appear to
have atrophied and move relatively slowly. Is this the fate of every
startup? Why does this happen?

The slowness of large corporations is an inherent problem with
economies of scale. The first person in a company adds the most
benefit to that company. With a second person, there are some
communication issues. The code is now no longer written by one person
and there is a lower understanding of the code as well as a possible
class of style. The second person cannot work at 100% output because
the team must deal with these added issues. The third person makes
matters even worse and this pattern continues on.

With more people also comes office bureaucracy. As people no longer
know everyone within the company on an intimate level, colleagues no
longer feel a sense of camaraderie with other colleagues. They place
their own wellbeing as well as that of their friends on a much higher
pedestal than that of the company and are even willing to sacrifice
the success of the company for their own ends.

Thus, logic would seem to imply that no startup can avoid the
impending fate of the giant corporation. With growth comes inefficiency.

It is important to realize that not all hope is lost. There are still some
large corporations that manage to move at lightning quick speedsĀ 
with respect to their size. Microsoft manages to continuously
iterate and add value to their existing products. Apple manages to
innovate and create new ones. And Google manages to continuously
create new technology. If a founder can find the key that links these
successful corporations together, he may have found the key to moving
a startup into a successful corporation. I do not know what that
secret is. But I can say that the three companies I mentioned do have
one thing in common: their founders are all still involved with their
companies. And the CEOs of every single one of those corporations been
with the company from the very start.

Posted June 27, 2010

DVCS: Why I chose Mercurial over Git

From the very start of my OS class, the TAs have been singing praises
to the awesome glory of distributed revision control systems and,
namely, Git. Thus, when it came time to start what many at CMU
consider to be the hardest project based assignment this university
has to offer, my partner and I both decided we would try out Git. So
off we went, walking down the path to the DVCS world.

After finally finishing my kernel for class, I've decided that to move
away from Git and try Mercurial for the next project. Why? During the
course of development, we quickly found many of Git's strengths and
weaknesses. And the weaknesses aren't things I'm willing to live with.
At least to me, Git's naming conventions for its commands never seemed
to be intuitive or descriptive for the average user. How do you move
between revisions? Not really sure. Git checkout with a bunch of flags
that I will never remember should do the trick. Need to make a new
branch? Try git branch--er... I mean 4 seemingly random Git commands
that don't include the word branch. What about reverting to an old stable
version of the codebase? Side story: before the kernel was due, my partner
and I ran through the codebase documenting anything and everything.
Somehow, through documentation, we broke the build (epic fail).
When we attempted to move back to a stable revision while keeping the
broken branch so we could copy and paste documentation, things went to hell.

What you say? I must be mad. Google can solve all those problems. This
leads me to my second major issue with Git. Its online documentation
is extremely lacking. Not only is documentation for some important
features sparse, the documentation that does exist is rarely ever of
the best quality. Sure, a Git guru could probably use it to do
anything he'd ever want with an RCS. But I'm not a Git guru, and I
don't think I should need to be one in order to use Git effectively.
I'm not looking to become an expert on Git. I'm looking to find a new
tool that can quickly help me make better programs.

After my quick foray into Mercurial, I must say I'm extremely
impressed. Every command within Mercurial is intuitive. Hg update
moves you to the tip of a new repository, hg branch does exactly what
you think it does, and hg revert and rollback work like they should.
Every command in Mercurial seems do what you'd expect. An even
greater plus, Mercurial's online documentation is excellent and a brief
glance at one of the many high quality tutorials can quickly get you
up to speed.

I pay my tribute to Git for being the wonderful tool that introduced
me to the DVCS world. In the future, I almost undoubtedly will run
into a project that uses Git and may even master the ins and out of
the system. For now, however, I'm sticking with what'll help me create
better code now. And that means switching to hg.

Computer Science vs. Software Development

There appears to be an extremely common misconception among the
general public today that computer science and software development
are synonymous and equivalent. Even more worrisome is the fact there
are a number of professional developers in industry that can't
distinguish between the two.

Case in point: Recently I came across a post on Hacker News heavily
implying that an engineer that could solve P vs. NP must be an ungodly
coder.

The reality is, P vs. NP has nothing to do with coding ability. The
very problem itself is more of the element of the field of
mathematics. It is completely unrelated to software. I once met a
celebrated theorist at IBM who was able to prove a secure
cryptosystem. He didn't know how to program.

Why then do so many view CS to be equivalent to software development?

One likely explanation is that a large majority of professional
developers have CS degrees. It also tends to be the case that
graduates of a CS program get jobs as developers. Thus, the human mind
tends to associate the two and view them synonymously.

The theoretical ideas of computer science are also frequently applied
throughout software development. Yes, computer science is related to
software development, but this doesn't mean that the two are
equivalent or even that skill at one implies skill at the other.
Fundamental knowledge in CS will definitely allow for more efficient
solutions to problems and more elegant solutions that will be
transcribed into code. But to be an amazing coder, all you really need
to know are the fundamentals of computer science. The basics. No truly
in depth knowledge or understanding is required.

Greater theoretical prowess also does not translate into coding
prowess. Coding requires sense of organization and style. Theory can
lead to simple and elegant solutions to problems, but if an unskilled
practitioner implements the algorithm, it can become hideous and
unmaintainable. There are many different ways to implement an
algorithm. The majority of them are suboptimal.

Computer science is a theoretical field where research is done and
discoveries are made. It is purely academic. Software development is
one of many applications of computer science. It requires a solid
understanding of the foundations of computer science to be a great
developer. In this case, however, the implication only goes one way.

First Posterous

So, today I've finally gotten around to creating a blog mostly due to
peer pressure. I have many comments and ramblings about everyday life
that most people probably wouldn't care about, but possibly the
internet may find my thoughts interesting.

So today is a happy day. I was selected for an onsite with
YCombinator, but one of the people at YC, Harj, doesn't seem to like
our idea. He had an interesting comment when I talked to him on Skype.
My cofounder and I are both extremely young. We've got absolutely
nothing to lose and thus, we shouldn't be aiming at starting a 10
million or even 50 million dollar company. We should aim big. Be
ambitious. Take over the world. Thus, for the past day I've been
trying to take over the world (haha) with an awesome buttkicking idea.

Now, I'm back to either writing my Operating System for 15-410 (OS) or
thinking up more ideas on how to take over the world (its elusive).
Hmmm... Maybe I'll call my OS Chunix... Sounds good...