How to inherit from node stream

I recently had a project where I wanted my node javascript class to support pipe to and from (in other words, to be a duplex stream). I couldn’t find sufficient examples of implementing streams (probably because the node API has changed so many times). The current node stream docs are pretty clear, but miss a couple points.

After solving the problem, I created this Node Stream Inheritance sample project with code to share. The project contains sample code showing how to subclass node’s Readable, Writeable, Transform, and Duplex streams as they exist in node v0.12.2.

The node streaming API has a long history. For the purposes of this project, I wanted to make my class support streaming in the simplest way possible without any external dependencies.

Specific node stream inheritance questions

I implemented this project to demonstrate the following:

  1. What event should I wait for to know stream is complete? This depends on the stream you are piping to (the sink). In the case of a FileStream, handle the 'close' event.
  2. What events should I implement (emit) to indicate my writable stream has received everything it can handle? In this case I opted to implement my own 'full' event – indicating that my custom stream had no room for further input.
  3. How do I handle errors? _write and _transform provide callbacks. If you pass your error out, the underlying implementation will automatically emit an ‘error’ event. In the case of _read, you need to use this.emit('error', err);

Sample code

Here is some sample code from the project.

Duplex stream subclass test

Duplex stream subclass example

Alternatives

If you do want to use a library for a simpler API (that is more consistent across node versions), look into through2 and event-stream. Thanks @collinwat for the code review and recommendations!

Related: Top 6 problems with node.js

Modeling the brain. What does the fox eat?

Recently, Numenta Founder Jeff Hawkins made a claim that we’d have the technology necessary for “intelligent machines” in just five years. Numenta is working to model the neocortex in the hopes of combining machine learning with self directed action. Wow. I’d love that. But I think most normal people are terrified.

For background, please see these two videos.

In his talks, Hawkins cites an impressive example of generalization. During a recent hackathon, Subutai Ahmad ran a program training a Nupic Word Client (I’m just starting to grasp the terms, this may include a Cortical Learning Algorithm (CLA) and/or Online Prediction Framework (OPF)) on a short list of words. The sets of words were for example “cow,eats,grain.” With each input, the algorithm predicts the output, then adjusts based on what you teach it (load into it).

The impressive part was that the algorithm predicted “fox eats rodent” – without having seen the word “fox” before in the input.

The code actually does sort of “know” what a fox is, though. It queries Cortical.io (formerly CEPT) for a Sparse Distributed Representation (SDR) of each word it sees. The SDR is a numerical representation of the word, derived from “reading” wikipedia. Still, this is an impressive generalization – it is effectively understanding that foxes is like other animals, and it knows that those other animals eat rodents, so it appears to guess that foxes must eat rodents.

Comparing 2 SDRs

Comparing the SDR for “fox” (left) with “fish” (right)


There is a ton of interesting stuff going on here, including real-time learning that I won’t even attempt to explain. In the videos above, Hawkins explains how this is different from a traditional neural network.

But, in some ways this demo is misleading. It is not showing how the neocortex works (or how the brain reads, interprets, and generalizes between words), it is only showing how the building blocks we’ve got so far can be hacked to do interesting things.

The experiment only shows how an upper layer of the brain might work.  This demo (unless I’m misunderstanding) is showing how one layer CLA/OPF magic behaves when fed a sequence of richly derived word meanings (which in a multi-layer model would be learned and stored by a different level of the brain).

What I wanted to test was how robust this prediction is. Did Ahmad just get lucky with his 38 lines of input?

After a couple hours twidling my laptop to get things to run, I did reproduce the result with the same input file.  Aside: it is wonderful that Numenta and the hackers have open sourced their code and examples so the rest of us can play with it!

However, I also gave it a bunch more input, and got different – sometimes less or more logical results. With this data set, I get:

Input 1 Input 2 Prediction
fox eat grain

I also got “leaves” or “mice” with other variations of the input (I didn’t change much related to animals). It seemed kind of random.

But, I also get these great results (starting with grandma the first time it sees any of these terms in the input file)…

grandma likes music
grandma eats water
cousin likes sun
nephew likes music
niece likes water
brother likes iphone
raccoon likes release
horse likes sun
musician likes artists
runner likes beach

“Release” and “artists” don’t exist anywhere in the input. WTF? To be sure, I’m not training it on the best data set, and it is coming up with reasonable predictions. Here’s the full input and output.

I tried a bunch of much more abstract terms to see if we could use this at Haiku Deck, where we’ve got some interesting ways of guessing the best background image for a slide in your presentation. While the algorithm is clearly learning, it leaves a lot of mental jumping to decide if its predictions are correct.

I have no idea how Numenta is regarded by other AI or neuroscience researchers. But Numenta’s advances in modeling the brain have definitely re-awakened my dormant interest in Artificial Intelligence. Next, I want to try simpler input like images or even X/Y data (bitcoin prices, anyone?).

 

 

Adopting Node.js

Scott Nonnenberg wrote a great post about how Node.js is not magical.

I totally agree that Javascript is “a bit tricky.” A lot of times
people that I work with think they know javascript better than they actually do. However, I’m also noticing that it is becoming a lingua franca – more people know it.

Another good point that Scott makes: Taking the time to do tests right is hard. In my opinion writing automated tests is necessary for shipping good node code. As a developer, I try to include it in my estimate whenever possible. It’s been hard figuring out how to test all the asynchronous code, but worth it to make sure code works fully before shipping it.

If you’re just getting started with Node, see Scott’s post for some helpful links.

For me, Scott’s #3 (async by default) is the big gotcha. I just don’t think 80% of server code needs to be asynchronous. It’s not worth the complexity it adds to writing and testing your code. I always think code should be simple, not clever, and written for non-geniuses to understand. Computers are cheap, and brain power is expensive (at least for now) – so if you have a server-side function that retrieves data from a database, does a few other things, and then returns it to the client – it doesn’t need to be asynchronous to have sub 100ms response times.

Many web data services do enough I/O to benefit from the concurrency.  But, I don’t think node is the right choice for human-facing websites. I still prefer Ruby on Rails (unless your team only knows javascript).

Related:

Top 6 Gotchas with Node.JS

Node.js has many advantages. Many developers know Javascript at least a little bit. Node offers great performance for applications that wait on I/O (like web services). There’s a vibrant open source community creating reusable node modules for almost every task (and a handy package manager to bundle them together). Node is cross platform – the same code runs on Windows, OSX and Linux. And of course, there’s the promise of re-using Javascript patterns on both the client side and the server side.

I love node, but I’ve run into several gotchas with node in production the last few years. Here are the top 6.

1. The website (server process) is fragile. With node, the process restarts on every unhandled exception. This is different from Ruby on Rails, where you have a separate server process like Unicorn, or PHP where it runs as an Apache module.

With one client that was transitioning from PHP to node, our first deployments felt like we were “plugging the dike” – there were so many server crashes. Until we stabilized things, this sort of gave node a bad name (the perception was that PHP was more stable – but in fact the exceptions were just hidden).

Handling all potential errors, as well as unplanned exceptions is impossible. Most deployments have another process to restart the primary node server process when it bombs. If you’re using a PAAS provider, this will be provided for you, but if you want fine grain control of your server (most web application companies will), hosting and scaling (keeping processes up and enough process around) is a challenge.

Common server-restarting libraries for node are: Upstart, Forever, or God.

2. Writing and maintaining asynchronous code. Callbacks are a simple mechanism starting out, but as your server code grows in complexity, they become a hairball. Crazy callback chains (AKA “callback hell”) make code unreadable. Even simple code that could’ve been procedural is forced to use callbacks due to the libraries you’re using.

Mixing synchronous and asynchronous code makes it difficult to handle every error. Furthermore, you have to understand the event loop to know when execution actually switches contexts. As you’re running your code – if you forget to call a callback, you’ll be left waiting to find out what happened. And, asynchronicity makes even your tests become more complicated.

The Async library exists to handle common asynchronous patterns. Personally, I prefer the “promise” (or “deferred”) pattern. It seems like a hack to send a function as a parameter, instead the promise pattern returns an object representing the state of the asynchronous call instead. jQuery has a promise library for the client side. When is the promise library we selected for a recent node-based web site (comprehensive API and more standards based).

3. Too much choice.  The flip side of all those great NPM packages is that there is no clear winning web framework (think Ruby on Rails) for node – in fact, there are several competing server side frameworks (express, hapi, sails, kraken). Each with their own conventions for:

a) File structure
b) Deploying code
c) Building assets
d) Automated testing
e) ORM

And of course, the best choices for each of these changes monthly. This, combined with #2 is why Node is not yet a great choice for simple human facing web sites.

Of course, it’s not necessary to use node on the server.  On one team we used it to great success for building thousands of lines of Javascript and templates to be hosted inside a Python Django website. The problem of too much choice exists for client side frameworks (and build tools) as for the server side (you’ve heard of Backbone, Angular, Ember). Node (with the help of tools like grunt or gulp) is essential to building a modern front-end, just prepared for the evaluation time and conflicting opinions on frameworks.

Screen Shot 2014-12-04 at 10.55.39 AM

Visualization of NPM Packages

 

4. Javascript the language. Many developers think they “know” javascript from having used it in client side code. In fact, they haven’t learned about things like prototypal inheritance, function binding (what is this pointing to?), or variable scoping that are unique to Javascript. Even once you’ve chosen Javascript, there are still a lot of decisions to be hammered out amongst your team. Code conventions (4 spaces or two?) Camelcase or snake case? Jshint is a huge help with this. But, be ready for…

5. Functional versus object oriented programming. There are two very distinct paths any Javascript project can take. Functional style favors dependency injection for testability and clear interfaces (that are theoretically more re-usable). Object oriented favors composition and inheritance for more readable code. The best choice is probably a mixture, as both are silly at the extremes. My current team is finding compromises, but please let me know if you’ve got any guidelines for when to be functional and when to be OO.

6. What about Coffeescript? I hate semicolons. Compared to other languages that give white space meaning (Python and Ruby), Javascript feels heavy and hard to read. I love Coffeescript because it lets you write clear (information dense) code while avoiding common Javascript pitfalls (see #4 above). Node supports Coffeescript natively, but unfortunately, transpiling from Coffeescript means you lose a debugger (and sometimes line numbers).

You also lose the benefit of going with a “lowest common denominator” language (everyone knows Javascript, C, or Java). And, you still need to know javascript deeply (#4). Worse, it can create division on your team. Make sure if you switch to coffeescript, you switch all the way.

Thanks to Scott Nonnenberg, author of thehelp-cluster (a solution to #1) for feedback on this post.

Please leave a comment and tell me: what problems are you having with Node.js?

 

 

javascript compare strings

For reference, this is all you need to know.

“0” < “A” < “a” < “aa”

In other words, first numbers, then capitalized words, then lower case words.

Here’s how you javascript compare strings

If you are sorting an array of strings, simply use the Array.prototype.sort() function as above.

If you want to write your own function to compare strings, use something like …

function(a, b) {
if (a < b) {
return -1;
}

if (a > b) {
return 1;
}

// a must be equal to b
return 0;

}

 

How to look like an open source dev superstar

As a developer, your Github profile is quickly becoming your new resume. Hiring companies look at your Github activity as a measure of your skill and experience.

Github contributions

 How did I become a contributor to the Ember.js website even though I’ve never used the framework, and without writing a line of code? Here are the 3 easy steps you can follow to become an open source superstar.

  1. Find a prominent open source library. Or, just pay attention the next time you’re following one of those “How to install this” README.md guides.
  2. Read the readme and look for typos, spelling errors, or confusing grammar. Or, follow the instructions for installing something. Find confusing parts of the documentation where you can add more detail.
  3. Submit a pull request! Github makes this extremely easy for text files (like README.MD). Just click “edit” and Github will fork the repo and guide you through creating a pull request. Fix up the documentation, commit it, and submit your pull request. Here’s one I did to make the Ember.js documentation more clear.

Bonus

The steps above are good for open source in general, and also make you look good. If you want to make yourself look good in a more superficial way, how about keeping a public repo and contributing something daily. Or, write a robot to do it!

Screen Shot 2014-07-18 at 10.13.57 AM

Programming tips

I do a lot of technical blogging using Github gists. Github gists (and small public repos) are a great way to share code snippits. Here are a few I’ve made recently that you may find useful.

Please comment on these Gists so I can make them better.

 

Startup Developer Team Agreement

Five times now I’ve worked for a startup as it went through a growth phase (major round of funding). When it worked well, each new team member made the team stronger. When it didn’t work, the company was a revolving door. For development teams, it’s a tricky time.

Early developers enjoy high individual productivity as they work with the technology they know (or have pioneered). Adding more developers does not mean an immediate increase in productivity. More team members requires more interaction, planning, and code interfaces.

Developers are a quirky bunch. There are geniuses that come to work when they want (or not at all). There are verbally challenged code generators that turn out code faster than the team can agree what to build. Lone wolves that go off on all nighters and don’t come back with ship-able code. Work-from-homers that need to be “skyped in.” And the loyal guys that do what the boss wants today without finishing what he wanted yesterday. Not to mention the “leader” that rarely takes his headphones off.

For the people I currently work with, don’t worry – I’m not thinking about you ;-).

In this storming, forming, and norming process the team needs to set guidelines on how to work together. I’ve written before about 10x developers – a similar concept applies to productive teams. I’ve never been trained as a manager, but there are a couple things that keep coming up. It is critical to establish a team agreement centered around communication. What kills startups are the things that left unsaid, so nail down a few specifics with a “team agreement” document.

Example agile team agreement

  • Goals

    • Work on the right things for the business

    • Increase leverage by improving our skills and using the right tools

    • Ship code that works

    • Have unit tests and be able to ship often with confidence

  • Work day

    • Stand-up meeting at 10AM M-F: 1 minute to report on what you did the previous day, what you are doing doing, and what you are blocked on

    • If you can’t attend, send in your status update via email

    • Be available on Skype when you’re working

  • Sprint planning and process

    • A weekly sprint to complete top priority items from the backlog

    • Tasks recorded in Trello (or sticky notes, or whatever works)

    • Task complexity discussed prior to or during planning

    • Stick to your assigned tasks during sprint

    • Any time something gets brought into the sprint, notify the team and create a task to track it

There’s many other things to go into with team-building, but a couple other tangible things keep coming up.

1. Coding standards, including: why spaces are better than tabs. Make a decision as a team, convert your code once, and stick to it. This project samples Github code to determine what the standards should be.

2. Git collaboration workflow. Nailing this early will help. By all means take the time to make sure everyone understands Git as well as deploying code.

What makes your team uber-productive? Leave a comment and let me know.

Converting PNG to SVG (Vector) for Retina Displays

I’ve been working on a web site that is often viewed from iPads. iPads have a retina display (higher pixel density), which makes low-resolution images look grainy. Here’s an example, zoomed in to exaggerate the effect.

Screen Shot 2014-03-02 at 1.32.15 PM.png

The problem is the images are raster (bitmap or pixel) based, not vector (line art) based. Now that I recognize the problem, I see it everywhere. Even the Google search looking glass.

Screen Shot 2014-03-02 at 1.33.33 PM.png

Anti-aliased pixel-based PNGs are the standard. For web developers, the problem is, once they  flatten an image to a pixelated PNG, the browser can’t enlarge it again. This is a problem with the widening array of mobile display resolutions.

Unfortunately, a common fix is to swap in a 2x resolution image. This is a dumb approach for icons, logos, and buttons – anything that is not a photograph. All you are doing is downloading a second set of larger files to handle another fixed resolution.

All major browsers support SVG (Scalable Vector Graphic) files now, but before leaping on board, there are a couple performance considerations.

One consideration is the file size. Especially when developing for mobile web pages, for complex icons, a larger-dimensioned PNG may be smaller in file size than a vector representation. For most icons, logos, and buttons, SVG files will be smaller than the PNG. PNGs are actually not that small, since it is a lossless format.

The second consideration is CPU time to render and composite the image (often on top of something else). I haven’t done any performance tests. For maximum performance, scaling and compositing bitmap based icons will be the best. However, my guess is that unless your vector is crazy complicated, or your phone a few years old, using vectors will be fine.

One great trick I wish I knew before converting a batch of PNG icons last month, was that algorithmic conversion (aka image tracing) from pixel to vector is pretty common. There are several services like http://vectormagic.com/home that will automate your conversion. Vector creation is even built in to Adobe Illustrator’s with the “Live Trace” function.

Beware that for more intricate clip art, algorithmic conversion will not be good enough. But, luckily designers can be easily found on Fiverr to help.

Are you converting PNGs to SVGs? What tools have you found that will help, and what are the gotchas of working with SVG? Please leave a comment below.