Friday, December 7, 2012

Google+ Communities

This was inevitable. And I think it is a bold step forward into something awesome. Google+ communities. Could this be the next big circle? In many ways, yes. In some strong ways, no.

I strongly suspect the reason for cuommunities to happen within Google+ is based on how people use Google+ right now. Since most people don’t really have the full gamut of real life friends there right now, a lot of the activity is happening around community based interests. For example, at least once a week I see someone sharing a circle of ‘scientists’, or ‘googlers’, or ‘entrepreneurs’, or ‘developers’. You get the idea. Most of what I’ve posted is tech stuff. And now I mostly post to my developers, and ‘geeks and techies’ circle. More importantly, when you add someone to a circle, you see all their posts. Not just their posts on the particular interest you followed them for. Not so with communities. People sharing to a community share to it because they want to talk about their interest. This is why I pondered the question, could communities be the next big circle. After all, Google lets you share directly to the community only.

But what about the strong ways it isn’t the next big circle. Here’s the problem. When it comes to our circles of people we’ve found in an interest, we know who we’ve got. We know we’ve got passionate people and not fanboys. Even when we take a shared circle, if there’s noise in there from certain people we can remove them from the circle because it’s our circle. And that’s the key point. Our circles are for us. A community is, well, a community. Like it or not, the Dota2 community is going to pick up LoL fan boys and they’ll come along to troll about how the hero that’s been adapted from Dota 1 is just a copy of their precious LoL hero (while in fact their hero too has been adapted from Dota 1). And when that happens, there’s nothing you can do about it. Sure, the community manager can kick them out (I hope) but in the end, trolls come in greater numbers than community managers. Private groups isn’t an answer to this. It may be really high quality content, but the openness of having someone unknown come in to your world and discovering great stuff and in turn sharing more awesomeness will be sacrificed.

Thus, depending on how the communities feature continues to evolve, it’ll be interesting to see whether it becomes a place of continued awesomeness, or a long thread of YouTube like comments

Saturday, September 29, 2012

The Past Weeks. Building a Multi Threaded Scraper

I’ve been silent for the past bunch of weeks. No, this isn’t one of those ‘I’m back to posting’ blog posts. This is just telling where I have been and what to expect very soon if I’m allowed to actually post about it which I really do hope I am since the communities out there actually helped me so much in solving the problems. When I say communities I mean other people’s blogs and such. Which is really awesome when you consider that the information out there was so vast that I didn’t have to go to a single #python irc channel to get more information. That and the documentation on python is brilliant.

So, what is it that I’ve been so busy with over the last couple of weeks. Well, has gotten a partnership of sorts going with a business to help increase the number of items we can supply our fans and loyal customers with. Why? Because we believe that so much more can be done in this business. (Sorry but I can’t divulge the details just yet. That’s another reason why I’ve been so silent on the work that I’ve been doing). But how many items you ask? That’s a good question. The answer is anywhere between 2 million to 9 million items. The only problem was, due to the size and the nature of the partnership we would have to do the heavy lifting in order to get these products to us. Since they run an online site, the best way to do this and to keep our list up to date with their one would be to run a scraper across it. Not a crawler. A scraper.

Building this wasn’t easy. The tools I used were python and BeautifulSoup. During my first na├»ve implementation of the scraping I was taking about 2-3 seconds per item to be scanned. If I was to run the scraper across 2 million items that would take me 57 days at 2.5 seconds per scan.  Not good. The next thing to do was to build the scraper in a multi threaded fashion. At around 100 threads working I discovered the sweet spot between enough threads and speed and left it at that. The scrape time was brought down to 6 days. From that point onwards it was running a few trial scrapes. What followed was me discovering just how memory intensive this was going to get. In under 10 minutes I was eating into 3+GB of RAM and the usage wasn’t slowing. Several hard crashes later I realized that I would have to severely cut down on the number of objects I could have in the memory. I limited to running an insertion into the database every 100 records, and thus brought down the memory usage to just 70 MB RAM max. This was I think one of the biggest highlights for me personally. From there, I had to further optimize how the connections were taking place. The problem that I had before was that I was waiting for 100 Queue objects to be cleared before dumping the next 100 in. Why? Because I had made a bad design decision to not put a cap on the number of threads being used inside the scraper. So if I didn’t do a q.join() every 100 items I would have been in serious trouble where I’d be spawning 2 million threads in a matter of seconds. This was mostly because I was using both threads and Queues for the first time so I wasn’t too sure about the logical decisions I should make till much later.

Thus, the next decision made was to limit the number of threads, and keep a maxsize on the Queue so that it didn’t grow to two million objects either all at once. This way, as soon as a page was scraped the next page was slotted in instead of waiting for 100 to finish to get its chance. The result of this? 6 days to 3 days used for scraping.

And that, was what I was doing the past few weeks. I haven’t spoken about all the other little fail safes that I had to put in. There were a number of failures from the site being scraped and the internet connection at the office that I had to factor in. The scraper was launched yesterday in the evening. I have no idea what’s happening with it right now. But if it has for some reason broken due to bad coding I’ll need to build in another fail safe for starting it up and getting it to pick up where it left off. That’s one thing I haven’t had time to build given the constraint that the scraped items need to make their way to our site pretty soon. Pray I shall then I guess.

Monday, September 3, 2012

Quick Post: Passing Parameters for an SQL Statement with IN clause using SQLCommand (C#)

There are probably a variety of ways to solve this little problem as this Stack Overflow thread will show but I thought that showing how I solved this would still be worth it. All I do essentially is build a string, and based on how many parameters there are to insert into the SQL statement I insert the ‘?’ with a comma after it and once I exit the loop I insert one more ‘?’. Note that the number of times I run in the loop in one less than the actual length of the parameter array. Once I’m done building the string I then iterate through the parameter array and add each parameter in. This code is tested and working.

StringBuilder selectstring = new StringBuilder();             selectstring.Append("SELECT ROWID, * FROM tbl1 WHERE ROWID IN (");             Int64[] arr = {1,2,3,3,4};             for (int i = 0; i < arr.Length-1; i++)             {                 selectstring.Append("?,");             }             selectstring.Append("?)");                          sa.SelectCommand = new SQLiteCommand(selectstring.ToString(), conn);             foreach (Int64 a in arr)             {                 sa.SelectCommand.Parameters.Add(new SQLiteParameter(DbType.Int64, a));             }                          System.Data.DataSet ds = new System.Data.DataSet();             sa.Fill(ds);             conn.Close();

Saturday, September 1, 2012

Doubles and Decimals in C#

While I say C#, from what I’ve researched on floating point types so far, this concept could apply to most languages.

If you’ve ever programmed using double type values in your program I can safely say at this point that you are probably doing it wrong. You may never notice it but doing work with doubles is at some point going to give you an error that you may never notice until the day some critical function is supposed to happen based on a value and then you find the critical function isn’t getting triggered. Murphy’s law. Believe in it.

Where does this spawn from? While building a system for very very basic data analysis, I needed an even more basic addition of numbers to ensure that the total of values in a column was equal to 100 before allowing the user to progress to the next step. I was testing the program out and everything seemed fine, and just as I was about to roll the system out, I had a few more values to change in the database. Instead of doing it manually I thought I’d test the system again and do it through there. (Why is this important? Because it’s incredible where some things can go unnoticed through several hours of complete testing). While all this time I had been adding whole numbers, in this case I needed to actually add decimal numbers. Somehow, even though a basic on paper calculation gave me 100, the program was saying values do not add up to a 100. In debug mode, I discovered that at one point, 22.4+23.7 was giving me 46.9999999939. What. The. Debug??

Reading up through the documentation and several threads on the internet enlightened me that double values aren’t meant to be precise. They are meant for speed. They can be affected by the strangest things such as directX interfering with the bits. It’s scary, but it is what it is. The solution? Use decimal types. Almost always, you are going to use the slightly heavier but absolutely precise decimal types. DO NOT use floating point type variables.

But that begs the question. What are double types useful for? Sure, they are faster but where would you use them? Essentially in anything that needs speed while allowing for a very small percentage of error. Redrawing sprites based on their screen positions can use double to store the vector coordinates. Believe it or not, extremely large dataset mining, to get a general trend is a likely application. But essentially, for us who write applications for everyday use, we need precise values and therefore keep it in mind, decimal types.

Monday, August 27, 2012

Accessing MySQL Remotely With MySQL Workbench

Command line be darned. Visual tools are there for a reason and if you honestly find using the GUI to be easier in visualizing complex queries then you should use it. It also helps in saving time when you want to scan the database to see what’s in it. But that’s not really the point here. A few days ago I began developing some major systems for internal use at the workplace and it was finally time to let the SQLite databases go. Not so much because of data storage needs but mostly because of the need for SQL side validation of the Foreign Key constraints and also to ensure that I didn’t have to do a lot of extra work to ensure the data integrity stayed intact. But I digress. The problem was that I needed to connect the desktop application and MySQL Workbench as well to the MySQL database that was on a server and there’s really nothing on the internet that actually addresses this problem directly. The best ‘alternative’ to this is to actually use a web service to send the data back and forth. Since I’m not going online for now it’s not really needed for me to actually be transferring data like this.

In order to this I got my own virtual server setup using Ubuntu Server (ooo he’s taking the easy way out. No I’m not. Wikipedia uses Ubuntu Server. I’m using the best tool for the job) and my way of access into it is through SSH. For the record I have no idea how I got SSH into powershell and I suspect it happened at some point while I was installing the libraries for cygwin. Anyway, after SSHing into my server I checked around, discovered that the network admin had already installed LAMP and phpMyAdmin and thus my MySQL instance was up and ready. This would end up causing more problems for me than I had anticipated. At this point I’m not willing to go reverse all the steps I took to find out which ones worked completely right but I know which ones are absolutely necessary and can give options if the steps don’t work properly.

So the first thing you want to go do is actually read the manuals on how to create and manage user privileges. I’m in a bit of a rush here so I’ll add the code later but the main steps are as follows.

First up, create a user apart from root who has all privileges. Later when you learn the full privilege list then you can actually revoke what you don’t really need but for now I’m not entirely sure of what I need and don’t so I granted all privileges. The code went something like (no I’m not being at all precise over here.)

CREATE USER newusername IDENTIFIED BY ‘type your pass here with single quotes’;

GRANT ALL PRIVILEGES ON *.* TO newusername IDENTIFIED BY ‘your pass word’;


If you read the manual you’ll find this creates a user that can basically be accessed from any host. The reason for me wanting to do this is because I’m going to be needing a user that can be accessed from any machine inside the company as I’ll be making a desktop application that needs to access the db.

Exit the MySQL server. The next thing you need to do is turn off MySQL accepting only local requests. For this, open up the my.cnf file found under etc/mysql/ using sudo vim my.cnf. What you want to do here is comment out the lines bind-address = and skip-networking. Easy way to do this?



And that’s it. I think we are ready. I did go to the extremes of forwarding the 3306 port using iptables. This is the only thing that is really server specific and you’ll want to refer the manuals of your particular distro. I don’t think this step is necessary so skip it for now but in case the actual step of accessing the db through the workbench or app doesn’t work you’ll want to come do this (or the equivalent of this if you aren’t using Ubuntu Server)

sudo iptables –A INPUT –p tcp –dport 3306 –j ACCEPT

sudo iptables –A FORWARD –p tcp –dport 3306 –j ACCEPT

sudo iptables-save

Hopefully this step works without needing the iptables step above.

It’s time to connect MySQL workbench to the db. Here’s where I made the biggest mistake. I assumed that since I actually connected to the server through SSH, I should use that method to connect to the db when using Workbench. Turned out that I was wrong. Or at least, not wrong, but it turned out that after all of this, using a standard tcp connection worked fine. Give the server name as the ip address of the server you are connecting to. XXX.XXX.XXX.XXX that kind of thing. Port should ideally be 3306 (by the way if you don’t think your mysql instance is running on port 3306, highly unlikely as it may seem, just type mysqladmin version into the command line of your ssh session and check the results. There’s one that says port. That’s your port. If your port is different then change everything to match it. Doh!)

After you put the ip, put the username and the password that you created and test your connection.

You’re welcome.

And that’s how you connect a desktop based application or a MySQL Workbench to a mysql database that’s on a server.

Sunday, August 26, 2012

Liveblogging tools: Begging for Pricing Disruption

I know I said I would post on the conversation I had with the compere of the Etisalat event but there’s something that I need to get out of my head after an experience I had today. There was a time that I would go to tech events, live stream events, and update my blog through a live blog plugin. When I started out there was an EXCELLENT albeit ad supported tool for live blogging which was cover it live. Unfortunately, they discovered that free wouldn’t cut it and went to paid while leaving a free tier that has some strange restrictions on how many user actions can be performed on the live blog. That strikes me as a strange thing because that might mean that my live blog is not permanent. Once I go above the threshold for a particular event it’s shut down and I have to pay to ensure that it stays visible to future visitors of my blog.

So then I decided to look for some free alternatives out there. The main ones that I came across were the Wordpress plugin for live blogging, a site called Blyve and Wordfaire. There are many alternative sites though I believe that ScribbleLIVE and CoveritLive are the only two ones really worth considering.

What’s wrong with the other ones? Wordpress plugin is not really a liveblogging tool. In the sense that stuff doesn’t get pushed out to the viewers. It gets polled which isn’t the best solution if you are hosting it on your own server. The second option for that is to host your own meteor server which handles the pushing to the viewers but again, live blogging isn’t just for techies and therefore, the solution should not be tech intensive either.

Wordfaire is nice, but it’s in beta, isn’t all that feature rich and the worst part is that the embedding features are pretty bad. Not only do you have to customize your embedded live blog but as the event goes by you won’t find all the messages in it. It shows only a certain number of messages after which if you want to see the rest you have to visit the Wordfaire site itself to see the full list of messages. I imagine this is for advertising purposes but then that’s why I don’t like the idea of completely free either.

The best alternative I have found is Blyve. It isn’t almost as comprehensive as CoveritLive but it comes really really close. In the free tier you get 500 uniques per month. For a blog that sees only about double that activity for the entire blog for the whole month that seems like a pretty good deal. But the problem again is, what happens to the day when my visitors become substantial for my blog posts, but the number of live posts I do isn’t enough to justify paying a not insignificant amount monthly to use a live blogging service that still offers limitations on the number of actions/uniques per month it can serve?

The Per Instance Pricing Method

From everything that I’ve said I’m willing to bet that between those who pay monthly and those who use the free tier is a set of customers that are willing to pay some amount but use the free tier simply because they can’t justify paying a full monthly cost. What if any one of the above live blogging companies (I’d vote for CoveritLive and Blyve) came up with a model where people could purchase an instance of the liveblog for that particular post and pay a certain base amount based on the traffic that they expect to receive. If they receive substantial traffic after that they would receive a warning to pay for the next tier for that instance. This is unlikely to happen because unless your post is a really special event with global interest that gets voted to the top of reddit and hacker news, the traffic that you’d get would be fairly easy to estimate. So, step by step here’s how the payment would work

  1. I need to host a liveblog for this month’s Refresh Colombo. I visit the liveblog site and pay $4 for an instance of the liveblog which can host up to 500 unique visits for the duration. $2 for every additional 200 uniques I expect.
  2. The live blog is available and life goes on.

But of course what would happen once the event is over? If it’s a one time payment then the liveblog host bears a cost to keep that viewable in their system right? Here’s the cool part. Once the liveblog is complete, offer a snippet of HTML where all the content from the liveblog gets hosted across in the my site. That means all I have to do is copy that HTML and replace the iframe embed code in my site once the event is complete. This isn’t too tech intensive to be a problem and would solve almost all the problems for both parties. What problem does it not solve? The hosting of the pictures. If I want to host my pictures on CoveritLive or Blyve then they should charge me on a monthly basis OR better, move it across to Picasa or Flickr for me and give me the new HTML code which links to those pictures automagically. Boom.

This serves two main purposes. One is that I would be able to have a pricing that fits my needs and I’m sure, the needs of many people out there. And on a second equally important note, I would have some form of ownership of my data. Maybe the service doesn’t have to be Flickr or Picasa . Maybe they could offer to let me download the pictures so I can upload it to my own FTP if I’m at that level of tech savviness. And if they’ve named it right (for example, according to the time each picture was uploaded relative to the liveblog timeline) then I could simply do a find and replace to replace their URL with the base URL of my FTP.

This probably seems a little too complicated but at it’s most basic level, I pay for an estimated number of users, I get a new bit of HTML code to embed and I get to keep my photos for free in services I already use, or pay a small fee to let the live blogging company host it for me.

C&C is welcome.

Friday, August 24, 2012

Quick Post: Solution for YouTube Videos Not Loading While Paused

Play a game while you wait for your video to load sir

This is probably not something new for most people but it’s been something that’s bugged me for a while. I’m not on a fast net connection at home and when I watch YouTube videos I usually pause it and leave it to load. As of recent times I’ve noticed some videos not loading while being paused. Which really sucks. It’s not a big problem in the sense that I workaround it by letting it play in mute while I do something else but it’s a problem nevertheless. I don’t know what’s causing it but I do seem to have found a decent solution.

After searching on Google I found two Google group posts which led me in the right direction. The first was a confirmation that I wasn't alone. And the second one had a solution from a Googler. The solution? Change the quality of the video. Now you obviously don't want to do this while it is loading so ideally you want to do this right at the start which is what I did and I can confirm it works. What I did was to switch from 360p to 240p at the start, wait for the video to start playing and immediately switch back to 360p. Maybe it's my imagination but the loading seemed to be much smoother after that as well. Hope this helps

A Talk With an Etisalat Rep and Some DC HSPA+ Perspective

I’m not entirely sure I should call Abdul a rep but let’s just say right off the bat that rep is purely a term that I have given him. And like I mentioned during the live stream, I would relate most of the stuff I spoke to him about. It’s not a lot but it was insightful although there’s still an empty spot I need to fill by giving a call to the Etisalat hotline. Shame on me for not doing my research. First things first,

A quick recap of DC HSPA+

At the time of writing this post I’ve had the chance to be exposed to two presentations by Etisalat on the same topic, tested the new connection in more than one scenario, two locations and I think that makes it a little fair to give a small commentary and summary on what this is all about. Essentially, by allowing two simulataneous connections to originate from the same source the speed that one can achieve gets doubled. Both the practical and the theoretical speeds. But no one cares about the theoretical speeds right? Caveat though, there are three requirements that need to be fulfilled to achieve the new speeds. First up is the dual carrier compatible device. Second, is an ISP with the infrastructure to provide the speeds without choking the network. And finally is the capability of the servers you are contacting (ex: YouTube’s) to serve you at the max speed that the device is capable of.

The Rationale

When speaking with Abdul I was curious as to how they would be marketing this package. Let’s face it. Broadband is good enough to stream videos without a problem. YouTube videos at 720p and above can give issues but up to 480p is fine and honestly, that’s really good enough usually for most fail + cat videos. Even for the olympics, 360p was absolutely fine for a 21 inch screen. So why would most people need double the network speed at quite possibly more than double the price?

The first answer that came through was that this was being targeted as a family package kind of thing. This was in fact reinforced during the presentation at Refresh Colombo when the presenter mentioned that families would be able to share this connection without experiencing a drop in quality of their individual experiences. This also made sense with the fact that in the slides and the promotions the Etisalat groups were carrying around DC HSPA+ compatible MiFi units.

But that gave room to the question of corporate packages. Corporates don’t seem to be amongst the main target groups for this kind of thing based on what I understood since they are more based on the fixed line connections. There is in my opinion another avenue for this tech in corporates which is the small (like <10 people) businesses starting up these days. Connections like this would be ideal for ad hoc free lance partners to have fast internet without being burdened by fixed line issues. Of course, I think I’m stating the obvious here but I just want to open it up for discussion.

Pricing & Concluding Thoughts

When technology isn’t being geared towards the individual you have to imagine that it isn’t going to be cheap either. After all, it’s for the group and therefore the per individual cost may stay the same. Based on that, I’m guessing since these packages are probably aimed at groups of 3 people and more the price should be roughly 2.5x that of any comparable package. The equipment should also be about 3-6x more expensive. The question though is whether or not it’s worth it. If the internet works as advertised, I’m inclined to say it is based on how much data is included under each tier. Looking at what Etisalat has right now, the Rs. 1,500 package gives a user 12gb before requiring extra payment for each Mb (20 cents per Mb). SLT gives a user 25gb  at 8Mbps for that amount. To get to 25GB you’d have to pay an extra Rs. 2662.40 to Etisalat under the current connection. Add that to your SLT bill and you would be Rs. 700 away from the Web Pro package that gives 60GB at 8Mbps.

Speed does matter, but with speed and plans to create family oriented packages I think Etisalat would have to choose to get rid of their existing packages and tailor some new ones since all that added speed is going to amount to people completing their quotas really really fast. 12GB is honestly nothing at all. My smartphone usage is 25% of that per month usually so one can imagine what my standard internet usage is like. And as for SLT’s quality of service, beyond the FUD you see on the internet I’ve actually been hearing good things about their newer packages which means that in a battle for pricing I’d still not go with Etisalat. Of course one could say this is Apples and Oranges but given that it’s a family package oriented thing I don’t think the fact that I’m comparing fixed line vs mobile broadband really comes into play here.

The one other concern I do have of course is coverage. I had a chance to play with an Etisalat DC HSPA+ connection at Refresh Colombo yesterday and the maximum I could pull from it was 0.3 Mbps!! That’s ****!! But to be fair we had some crazy rain but then again the rain had died down pretty much by that time so I don’t see how that really works. After all, my dialog dongle was clocking 2Mbps before I knocked it out of the USB slot thereby ruining the rest of my test.

So there you have it. A full evaluation of the Etisalat DC HSPA+ ‘initiative’. In summary, the speeds are real when they do work, the applications for individuals apart from journalists are minimal enough to not make the jump and the charges that could surround this are also a little doubtful BUT I will not make a final call till I call the hotline and hear what they have to say. So, there will be an update to this post but for now this is it.

Wednesday, August 22, 2012

Refresh Colombo August Meetup

It’s been a while since I blogged a refresh colombo meetup so this should be fun. I’ll be giving one of the three presentations which I’m really looking forward to. For the uninitiated, Refresh Colombo is a monthly meetup open to anyone interested in tech. And when they say interested in tech it can be from any angle at all. Not just the deep in tech programmer level style. Like the site says. Bring anyone you want with you as a guest. Even your grandmother. Yes. The sweet lady who takes pictures with the iPad you gave her to stay in touch with you. Jokes aside, this time’s Refresh Colombo is looking to be awesome and I am just as pumped up about the other two presentations as I am about my own. What’s on the topic line?
I’ll be presenting on Building Software Products Anywhere. Since joining I have experience a creative high like never before in my life and I am building and rolling out more products over time and learning more about good software than I ever have before. This is strange because my original Job description doesn’t really call for anything related to programming. And more than that I’m planning on being responsible for a shift in how the company uses tech to complete its day to day work that will eventually transform this company into a tech startup to some degree. I want to share this experience with other software devs out there. Why? Because I believe that every software developer who wants to love what they do should be allowed to experience creative highs by taking charge of building products. And since not everyone can afford to be an entrepreneur, I want to share how you can still engage with building software products in the most unlikely of places.
The rest of the topics as per Refresh Colombo.
Visual and Creative Thinking – by Shiran Sanjeewa
Shiran Sanjeewa is the Creative Director at Elite-web-studio a Manchester Based Creative Agency. He possess extensive International Expertise on Branding, Websites, Mobile Applications, UI/UX and Online Marketing. In 2012 he founded “Shiran Sanjeewa Associates” a Sri Lankan Startup Branding & User Experience Consulting firm, now serving Silicon Valley clients with the user experience design on their software and hardware products.
I am really looking forward to this topic given how much I really care about User experiences. And from a person who has an impressive background like this this talk should really be a cracker.
Dual Carrier Cellular Networks: A Practical Outlook – by Damitha Wijewardhana
Damith Wijewardhana holds an electronics and telecommunication engineering degree from University of Moratuwa and a MBA from Postgraduate Institute of Management. He is also corporate member and a Chartered engineer of institute of engineers Sri Lanka and a member of institute of electrical and electronic engineers, USA. He has 6+ years of industry experience in Radio Network Planning, Optimization and related new technologies locally as well as internationally.
Sounds familiar? I assume this talk will have a lot to do with the recently announced DC HSPA+ network by Etisalat. And based on the actual speed tests that were made yesterday and by Shazly’s impressions of using it in Dehiwala I ‘m also assuming that this talk will probably give a bit of a rational side to the whole hype of the speeds of a dual carrier network and maybe a bit of what a network of this nature might include in terms of costs. Which reminds me I should give a call to Etisalat’s hotline to find out details of their packages for DC HSPA+.
Look forward too a live blog from me although I obviously won’t be able to blog my own topic. But in place of that I might livestream my own talk. And I will definitely blog about it as a follow up post too. Stay tuned for more information!