Tag Cloud Extension for Google Chrome

Last updated on March the 11th, 2013: Version 2.0.4

(Scroll to the end of the page!)

To be honest, I have been delaying this for a while. It's been at least a couple of months since I had this idea to work on a tag cloud extension for chrome, but I was overwhelmed by my other projects, and so I kept neglecting it.

To be even more honest, I was craving for some time to work on it, when finally, yesterday, taking advantage of a (very cold) Sunday, I found a loophole that allowed me to do it, I started exulting in a very moderate way... basically like this. The key was AngularJS since I have been experimenting with it for a while, lately, and so it looked like the perfect chance to put what I had learned in practice.

If by now you think I'm wandering off a bit... you are probably right! But I'll get to the grain in a minute; first, however, I'd like to share my impressions on Angular, because it amazed me, really. It allows you to have a clean, MVC-compliant application, with all the power of declarative programming, templates and dependency injection. Creating a complex, structured application is fairly easy and quite enjoyable, so I'd definitely recommend you to give it a try.

What's Tag Cloud Search + Translate?

Now let's go back to the extension.

It mainly consists of two complementary parts:

The first part adds an entry to your right-click context menu inside Chrome; the new entry is shown only when you have selected some text inside a page, and allows you to search the selected text on the most common websites you might want to search on (Google, Twitter, Facebook, Quora, Wikipedia), or to translate the selected text using Google Translate (by default, it sets translation from an automatically determined language to English). All of these options open the search in a new tab/window.
The second part, instead, offers a different, powerful alternative. While direct selection search (with literally 2 mouse clicks) is a very convenient plugin, this extension allows you to have a deeper knowledge of the page before starting your search.

By showing you a Tag Cloud with the 100 words that are most often used in the any page displayed in your Chrome browser, it provides a quick synthesis of every page you browse, in less than a second, and with only one click.

Together with the Tag Cloud, actually right above it, a search bar is displayed: you can click on any Tag in the Cloud and add it to the search bar, then you can edit the search bar text, and finally search it with one of the search engines mentioned above.

App Structure

So, for this project I have been using AngularJS. I know, I know, I have already said it too many times, but... I can't help it! No, seriously, it is the core of the projet, so I couldn't avoid mentioning it here.

There are actually a few more amazing (really amazing!) libraries I have been using to pull it all together:

A thank you to the teams and people behind them it's the least we can do, cause they allow us to build very cool app with far smaller effort.

As for TagCloudChromeEx itself, I tried to abide to the MVC pattern, following Angular guidelines, so I created a model (content.js), a controller (PageController, inside popup.js) and a view (tagCloudRender service, also inside popup.js).

I decided to use services (another cool feature allowed by angular) to have an even greater decoupling: there are 2 of them, one, as mentioned above, is used to render the view, the other one, tagCloudService, is used to connect to the model: by changing it, it should be possible to have the extension working for other browsers as well, with virtually no changes to the rest of the code.

The dynamics inside these objects are quite simple, but it is worth mentioning a couple of choices and a few helper classes used in content.js:

To enforce decoupling and keep all the scope related actions inside the controller and hidden from tagCloudService, a callback is passed to tagCloudService to be executed with the result provided by the model to the service, after being called.
To avoid repetition or missbehave on the service side, the callback uses the call-once pattern, so that it won't be callable more than once.
FixedSizeMaxHeap is a data structure in between an array and an heap. It is actually both: since the goal is to preserve the k largest elements inserted into the data structure, and to do it efficiently, the best solution is to have the array holding the elements structured as a minHeap: although counterintuitive, it supports our need for a quick way to access the smallest element of the array (in constant time) so that, when the array ha already reached its capacity, the new elements can be quickly compared to the minimum in the queue, and if smaller, directly discarted; moreover, as an heap, it support insertion in time at most logarithmic in the capacity of the heap: this means that even if we take a look at a million elements, or more, being the capacity constant, the asymptotic time required for a single insertion will be the logarithm of a constant (and hence constant itself), so the total time to find the k largest elements from a list of n elements will be O(n). And that becomes a very important save as n grows over 2^32.
makeTag. As widely known, JavaScript doesn't support interfaces (although you can work something out with prototypes) nor types and therefore type constraints (not to mention type Variance and Covariance - but that's for another post about Scala awesomness!) So, since we need to have a way to compare and order elements added to FixedSizeMaxHeap, we just require (or in this case, pretend to, to avoid an excess of defensive programming) that elements added to this heap structure adheres to a "comparable" interface, that's not explicitely defined, but nonetheless is illustrated by an example, the propotype for Tag objects. The makeTag method is simply a constructor for Tag objects; we designed it more as a factory than as a constructor, in order to avoid actual JavaScript constructors (and therefore new keyword, which is notoriously error-prone).
A black list containing a good deal of the 100 most common English words is used to avoid having words like "the" or "and" filling the cloud. For the same reason, words with less than 3 characters are discarded as well.

Please feel free to clone my repository on github: you are free to use this code for any non-commercial project, and I'd really appreciate if you would like to contribute to it by sending pull requests.

VERSION 1.3.1

What's new:

Clear button for the search bar;
Search starts by pressing Enter while the search bar is on focus;
Added an option page to set a default target language for translations and the default search engine to use when search is started by pressing enter.

VERSION 1.4.1

Page language auto-detection
Black list system for tags revised and extended to "foreign" languages (Italian, French, Spanish, Portuguese and German)

VERSION 1.5.0

New features added:

You can now shift+click on a tag and highlight all the occurrences of that word in the original page; to remove highlighting for the tag, just alt+click on it;
Highlighting for the page lasts for the whole "session": you can close the popup window, browse other tabs, and highlighting and selections will still be kept, at least until you close or reload the current one.

VERSION 2.0.2

Page scraping and Tag generation algorithm improved;
Help page available from the extension itself.

VERSION 2.0.4

The color used for highlighting can be chosen from the 'Options' page available in the chrome extensions tab.
Help page completed.