Sawdust Software

Written by Mike Salguero

Signing up for health insurance via Healthcare.gov

Published October 29, 2013

I have been an Obama fan for years, and generally have given him the benefit of the doubt on a lot of hard decisions he has made during his tenure as president… Recently, however, I have been surprised (horrified) by the administration’s approach to the launch of Healthcare.gov. There has been an attitude of “Obama didn’t know about all of these technical problems”…  as an executive of CustomMade, I feel that not knowing about the user experience behind your administration’s largest website launch is a complete lack of leadership, and a lack of knowledge about what it takes to build a great experience…

I decided that rather than be a naysayer (and repeat the stories about how BAD it is), I was going to check out Healthcare.gov for myself… last night, I signed up.. here is my experience. (note, I could not actually get all the way through because of the bugs!)

Keep in mind that this website cost $170M – $300M  to make..

Page 1: Login Page.. or, I’m confused already.. 

note that Massachusetts is not on the list

note that Massachusetts is not on the list

I’m from Massachusetts. And if I was looking for health insurance, I would be really really confused. Massachusetts isn’t even listed as a state on the drop down.  I assume this is because Healthcare.gov actually forwards you to the Massachusetts exchange.. but for a good user experience, that should happen here.  Fortunately, I have a brother who lives in Tennessee, so I used his address with my social to continue the process..

 

Step 2:  First impression.. for the coders here

For the engineers reading this blog, check out these Javascript files.. seriously?  I dont even write code and know that not compressing a lick of your JS is a HUGE issue. Also, backbone 0.9.2?  According to Wikipedia, that was deprecated in December of 2012.. Backbone is now up to 1.1.0  so they are launching this very expensive website already a year behind what is currently being used.

If you want to have real fun, inspect their javascript.. not.. one.. file.. compressed..

javascript healthcare.gov

 

Step 3: Accept Terms and Conditions

Now I am asked to accept the terms and conditions. There is a link here, but that takes you to the homepage of healthcare.gov … there are no terms to review, and nothing for me to review before accepting… also, that healthcare.gov link.. doesn’t open a target=blank, it takes you out of the flow and brings you back to the homepage..

Accept Terms and Conditions

 

Step 4: Continue button hangs.

In this step, I clicked the big green button, and it hangs (and never submits)..

First continue problem broken

 

Step 5: Helpful staff suggest moving to IE

Since the button hangs and does not let me continue, I decided to chat with a helpful representative. She suggested I use IE. 

Chat Transcript

 

Also.. Healthcare.gov seems to have decided to build their own chat client… why?

Chat from own browser

 

 

Step 6: New Browser, now logging in in Safari, finally hit the homescreen

I found this part to be somewhat intuitive.

Opening welcome screen

Step 7: I get stuck again

35 minutes into the process, I get stuck again… more errors

Screen Shot 2013-10-28 at 7.59.41 PM

 

Step 8: I’m stuck here, so we will end this blog post…

 

but before we go… Some other choice coding practices


 

-css files are not combined in the actual app but they *are* combined on the frontpage.

fixedcss_onmainpage


-Page size and number of requests to load a single page is ridiculous:

2mb



- Load time is around 40 secs sometimes

timefail



-There are also random javascript errors just lying around:
errorz



 

-And errors are pretty bad

internal_server_error



Oh.. And we got this! 

systemisdown



 Conclusion: Why is this Important?

I don’t want to take sides in a political argument about whether universal healthcare is a good idea or bad idea. But I do know that experiences matter.. a lot.. The fact that this website is built so poorly is not only a lack of leadership, but a complete waste of money. If this site truly did take $170M – $300M to build, it is indicative of the kind of waste that can happen when a government tries to do the job of private business.  In the case of healthcare.gov, technology has not improved the customer’s experience, it has confused it.. dramatically.

 

 

 

 

 

 

Written by Wes Childs

Hubot, Hipchat and Fabric

Published September 19, 2013

CustomMade needed a way to simplify deployments and let non-engineers deploy to our staging environments. After hearing lots of great things about chatops from Github, and Brendan, one of our tech leads, finding a great article from Matt Pegler, we knew combining hipchat, hubot and fabric was the key to solving our issue.

Luckily we were already using all those tools so it was a just a question of putting them together to create an elegant solution so we could deploy with a simple hipchat message… hubot deploy branch_x to staging_environment.

To get started you’ll need a fabric script with a task to deploy a branch to your servers. Then you need a hubot plugin to execute the fabric command. You could skip the fabric step and handle multiple execs with a forever deepening set of callbacks in the coffeescript but we found this approach quickly became messy once our scripts became even slightly complex with full error handling.

Fabric

# Deploy branch to staging 
from fabric.api import cd, env, parallel, run, task 
from fabric.decorators import roles 

# Define your servers 
env.roledefs = { 'staging-web-servers': ['s-web-01.example.com', 
                                         's-web-03.example.com', 
                                         's-web-03.example.com'], 
                 'staging-db-servers': ['s-db-01.example.com'], 
               }

# Update servers at the same time, e.g. in parallel.
@parallel
@task
@roles('staging-web-servers', 'staging-db-servers')
def pull_code_onto_servers(branch):
    # Path to your code
    with cd():
        # Update your code and pull your branch
        run('git fetch')
        run('git checkout remotes/origin/%s' % branch)
        # Restart any relevant services
        run("sudo supervisorctl restart gunicorn")

Code: fabfile.py

Hubot Plugin

# Description:
# Deploy branches to staging.
#
# Commands:
# hubot deploy <branch>

{spawn, exec} = require 'child_process' 
module.exports = (robot) -> 

# Deploy to staging 
robot.respond /deploy @?([w .-]+)/i, (msg) -> 
    # Get branch
    branch = msg.match[1] 
    # Tell the user hubot is working on the request 
    msg.send "Preparing to deploy now..." 
    # Execute a fabric command 
    exec "fab pull_code_onto_servers:#{branch}", (err, stdout, stderr) -> 
       # Important - Tell users if something goes wrong 
       if err 
           msg.send "Sorry, something has gone wrong." 
       # Tell the user the branch has been deployed 
       else 
           msg.send "Success: #{branch} deployed to #{environment}"

Code: hubotplugin.coffee

Server Setup

We recommend you use supervisor to ensure hubot is always running. Here’s an example supervisor config:

[program:hubot]
command=/srv/hubot/bin/hubot --adapter hipchat
user=hubot
autostart=true
autorestart=true
directory=/srv/hubot/
environment=HUBOT_AUTH_ADMIN="XYZ",HUBOT_HIPCHAT_PASSWORD="XYZ",HUBOT_HIPCHAT_TOKEN="XYZ",HUBOT_HIPCHAT_JID="XYZ",HUBOT_HIPCHAT_ROOMS="ROOMS"

Code: supervisor.conf

Security

With this setup hubot has access to your staging environments and can deploy any branch at anytime so you need to make sure you check a few items:

Do you want to restrict deploy commands to certain users in HipChat? Checkout the hubot redis-brain Is your hubot server secure?
SSH key only
Firewall restrictions
Tripwire Fail2ban
Hubot running as a non-privileged user
Is the hipchat client installed on people’s phones? Make sure screen locks are enabled.

References

Matt Pegler – http://www.pegler.co/2012/03/django-deployment.html
Fabric – http://docs.fabfile.org/en/1.7/ 
Hubot – http://hubot.github.com/ 
Hipchat - https://www.hipchat.com/

Written by Mali

Simpleflake: Distributed ID generation for the lazy

Published August 12, 2013

Note from Jim: Simpleflake, our second open source release, does exactly what it’s called: provide a very simple way to generate unique IDs across distributed shards (or even multiple systems). It’s simplicity is great for us: a startup engineering team with no DBAs and a very, very busy DevOps engineer. Simpleflake’s ID generation requires ZERO coordination with server IPs or MAC addresses (good as we regularly instantiate new hosts in AWS) or–even worse–any coordination of code with database content. We hope you find Simpleflake useful and invite forking, pull requests and comments. We have now added hooks to Travis CI to allow you to test your use Simpleflake more easily.

It’s a common theme that tasks which are very straightforward in non-distributed systems can sometimes be daunting in distributed ones. ID generation is one of those tasks.

Let’s imagine for a second that you have an RDBMS filled with data. A very common way to deal with increasing data size and load (especially writes) is to shard the database. What this entails is that your data is spread across multiple computers. In the simplest case, you now have the same table / schema on two different computers.

Now let’s imagine you want to insert a new row into a sharded table. Your old table probably looked like this:

[code language="SQL"]

CREATE TABLE kittens (
`id` BIGINT AUTO_INCREMENT,
PRIMARY KEY(`id`)
);

[/code]

Let’s say you had 55 things in the table. The id of the last thing would have been 55, and the next one to be entered would get id 56. But now that it’s sharded, you have two different AUTO_INCREMENT columns, and they’re both at 55. Now if you insert one to each, assuming you didn’t change anything, you’d get two items with id 56, one at each server. Uh oh!

Unless you’re lucky enough to be using an auto-sharding database, it’s now up to you to coordinate ID generation. So, we need a distributed method.

You could do a hackjob solution by changing the AUTO_INCREMENT starting values such that hopefully the range of one server won’t hit the other one. So you’d set server A to start at 0 and server B to start at 100000, and hopefully you won’t get 100000 things in server A or you’re going to have to deal with an expensive rekeying process. Also this quickly gets complicated for multiple servers, and is a pain to deal with.

Coordinated solutions

You don’t want a solution where you check each server one by one for the next AUTO_INCREMENT, since by the time you check all the servers, a server you’ve already checked could have gotten a new row that you don’t know about, and thus now your ID is wrong.

Alternatively, you could get the same ID as a different process which is competing with you to insert a row, and now you have a collision. You could then have a lock / unlock system to ensure that there is only one process getting an ID at the same time, but now your rate of insertion is much lower since you can’t do two writes simultaneously. This defeats part of the reason you’d want to shard: increased write performance. So the second property we’re looking for is being an uncoordinated system.

Flickr style

So having the ID generation spread out is difficult and problematic. So what if we pulled it out? Like the unix philosophy says: Do one thing, and do it right! Turns out this has been attempted. Flickr uses a “ticket server” to handle key generation, where they do an arcane `REPLACE INTO Tickets64 (stub) VALUES (‘a’);` every time they need a new ID. To handle the single point of failure, they have two ticket DBs, one doing even and another doing odd IDs.

They mention that the IDs drift out of sync over time and the values become imbalanced. Depending on what you’re doing, this might not matter. But if you’re like us and you need your IDs to be more or less ordered, then this doesn’t work out.

In any case, if all you need is getting an ID, do I really need our sole ops person to deploy and manage two new databases just to have auto increment and locking?

Separate service

Twitter came up with a different solution for this problem: Snowflake. It’s basically a separate ID generator service where a 64-bit ID is generated for you, and sent back. You can run many nodes, and Snowflake nodes coordinate with each other at the beginning using Zookeeper, to set a worker id. But they don’t need to keep coordinating all the time.

The simplicity of the ID is rather brilliant. It consists of a timestamp, concatenated with the worker id, concatenated with a sequence number. The timestamp at the beginning gives us the ordering characteristics that we wanted. The worker id in effect partitions the keyspace such that workers can’t collide with each other, and the sequence number ensures uniqueness within the same millisecond. Oh, and it fits in 64 bits!

The problem with Snowflake is now you have to run a JVM instance, Zookeeper, maven, all the stuff that comes with that in a production setup. This is great if you already happen to run that stack, but a pain otherwise.

Enter simpleflake

Our library, simpleflake, follows a similar pattern to Snowflake in that it is prefixed with a millisecond timestamp, but the remaining bits are completely random. What this gives you over snowflake is that there is no configuration to set, no state to maintain across servers and reboots, no moving parts, and no network calls. Plus, it’s up on PyPI and dead simple to use.

[code language="python"]

>>> from simpleflake import simpleflake, parse_simpleflake
>>> simpleflake()
3594162604452825250L
>>> parse_simpleflake(3594162604452825250L)
SimpleFlake(timestamp=1375160370.606, random_bits=6768802L)

[/code]

Easy! You can also manually set the timestamp or the random bits in the constructor.

A crash course in collisions

Of course, as with any completely random scheme, there is a chance of collision. Two events need to happen simultaneously to cause a collision: Two IDs get generated in the same millisecond (i.e. two writes), and the random component for those two IDs happen to be the same.

The chance of the first is a distribution problem: Given an average rate of insertion of R requests/sec, what is the probability that at least two requests coincide on the same millisecond? Turns out, this process can be modeled reasonably accurately by the Poisson Distribution. For example, at a respectable 100 inserts/sec, the chances of at least 2 inserts happening on the same millisecond:

PDF[PoissonDistribution[0.1], 2]
= 0.00452419

And then, for two inserts in the same millisecond, the chances that you’ll get a collision in random bit strings of length L (23 in our case) is closely related to the Birthday Problem (fascinating read, especially if you’re also into cryptography). Without going into too much detail, a decent approximation for that is:

2^2 / (2 * 2^L)
= 2.3842 × 10^-7

Multiplying the two, we get:

1.0787 x 10^-9

Which is extremely low. To visualize, if I had 10^9 dollar bills, I could put them end to end to go around the world more than three times (thanks Wolfram Alpha!), and an accurate demonstration of the probability would be if I chose one of those dollar bills without telling you, and you managed to choose the exact same one.

Another source of consolation for the overly wary is that if you *do* get a collision, since you have the exact same key, the key will go to the exact same shard as the other one, giving you a uniqueness error. When this extremely rare event happens, you just reinsert the same item with a freshly generated key. Ta da!

Besides, if you really are doing a sustained 100 inserts per second, even assuming you’re storing something small, like a tweet, you’d be generating upwards of 2.5GB/day (assuming around 300 bytes with text + metadata). In which case, it’s worth the cost for you to switch to Snowflake (or maybe even hire a DBA), in which case, if you’re already using simpleflake, you can do seamlessly, without data migrations!

One gotcha here is that since this is partially timestamp based, if you have reasonably high rate of writes it’s a good idea for you need to use NTPd or somesuch to keep your computer clocks in sync. This is not a huge deal, most modern distros come with NTPd already. You can force ntpdate to slew the time instead of making it erratically jump by setting the “-B” parameter.

So why is <insert large company> not using this, then?

For example, Twitter has much more write load than your average startup (or really, almost anyone). At 12GB/day just for tweet text and 800 tweets/sec, they would likely oversaturate the wide safety margin that simpleflake has, causing a lot more collisions.

Conversely, using something like Snowflake for anything much smaller is probably a waste of ops resources and CPU cycles.

UUID Hell

A common question I get is: why not just use a UUID? First, they’re huge. 128 bits overhead per item is not terrible, but it adds up fast sometimes. For example, indexes in a lot of DBs refer to the primary key, so if you have a long primary key and 6 indices, you’re taking that penalty 6 times.

Also, in the standard 16K page size, if you use a 128 bit ID, you can fit much less data. Now you need to do more disk reads to fetch the same number of rows.

Finally, there is also a matter of data locality. Random UUIDs are, well, random, so your data is scattered all across the btree (which is arguably the most common method of storage). If your access pattern is completely random, you probably won’t see a performance loss but usually this is not the case. For example. when you’re grabbing the freshest 100 rows for a user, having all required data in a few contiguous pages does wonders for reads. Even UUID1, which has a timestamp component in it, stores the timestamp semi-backwards, which messes up this neat property. If your database is fancy enough you can cluster by something other than the primary key, but even then in most cases you’re now looking at a separate auto increment id and more wasted space.

There are alternatives

There are a few other ways of solving this problem that have varying degrees of utility.

Range based systems work well for some but usually require coordinating between the clients. One such system is the HiLo method used by Hibernate, which smartly cuts down on coordination but has SPOF issues since it has a “special” server that coordinates the high range.

Another method is adding a hash of the IP / MAC address into the ID. This ensures that IDs that are generated on separate boxes don’t collide, but doesn’t cover the use case of multiple processes on the same computer. You could remedy this by also adding a process ID. Then, there is the problem of making sure two ids in the same millisecond don’t match, which you could use a sequence number for.

Even then, I think it’s tough to balance the size requirement with all this. With 23 bits left over from the timestamp, it’s hard to fit MAC/IP address, process id and sequence number. You can use a chopped hash or the last few digits of the IP/MAC, but I’m not convinced that the “security” this buys you would then be better than a purely random number. Snowflake manages to do this only because their machine IDs are precoordinated and thus don’t have to be long.

Epilogue

For our use case (lots of data, mild to medium insert rate, small number of engineers / ops), which I suspect aligns with a lot of others, simpleflake seems to fit well.

  • It’s easy to use and didn’t take long to code up
  • There is no state / config to maintain, which means we can bring servers up and down without worrying that it’s using the same machine id or such
  • It doesn’t have many moving parts, no additional infrastructure and no network connection goes a long way in keeping complexity down for a small shop
  • Has almost all the benefits of snowflake in being uncoordinated, distributed, k-ordered
  • Yet it still is forward thinking in that when we outgrow this, we can move to snowflake by changing a function in our code and flipping a switch

In short, it about finding the right fit for your stack and use case. If you think this is useful, get it on PyPI or fork it on github!

Presenting this at Boston Python

I presented a Lightning Talk on this at BostonPython for general comment and feedback. Here is my presentation:

You can visit Boston Python’s YouTube page to see video of this and other July Lightning Talks.

Written by Mike H & Wes

DisRedis: an open source client to automate sharding & failover

Published July 9, 2013

Note from Jim: At CustomMade we are big fans of open source. Today, we are releasing our first open source repo for use by others. Feedback, forking and contributions are welcome.  

The reason we created DisRedis (“distributed redis”)

While looking at options to reduce the load on CustomMade’s overloaded database, we came up with a plan to move user session data out of the database entirely and into Redis. Since session data is somewhat ephemeral and is retrieved up on every page view, it seemed like a great target for this treatment. Redis, for those not familiar, has all the features of Memcache, but adds quite a bit of functionality. In particular, one feature of great interest was the ability to easily set it up for master-slave replication.

While the session data is ephemeral and it certainly wouldn’t be the end of the world if it got lost at any point (the worst that would happen is that users might need to log in again) we decided it made sense to make our Redis datastore as resilient as possible. With that in mind, we did some research on setting up Redis clusters. Although Redis Cluster seemed like the right tool for the job, it wasn’t finished at this point and therefore wasn’t an option. However, we found that Redis features a Sentinel mode which can be used to automatically promote slaves to masters in the case of a failure. This seemed pretty useful, but the next problem was getting that information to all the Redist clients at the same time so the decision to switch over to a particular slave would always be unanimous. In addition to that, the standard Redis client doesn’t support sharding out of the box. Out of these needs, DisRedis (“Distributed Redis”) was born.

DisRedis in a nutshell

DisRedis is a Redis client implementation which can be used to automatically handle sharding and fail-over for a Redis store through Sentinels. It simplifies configuration by only requiring one or more Sentinel addresses to be specified. On startup, it queries the Sentinels and acquires a list of master Redis servers to shard across. Whenever a Redis server connection is disconnected, it queries the Sentinels to determine if a new master has been selected. It will even re-try the query on the new master to prevent the request from failing.

While DisRedis uses standard hash based sharding, some research provided us a great way to meet our scaling requirements without any of the problems of adding servers to system like that. Craigslist made this blog post about how they scale their Redis cluster. It describes a system for pre-sharding, which takes advantage of Redis’s master-slave replication in order to seamlessly expand a cluster. Simply put, you first determine the current expected cache size and then the biggest size you’d like to support before having to lose data to rehashing. By starting up enough master Redis instances across only two servers, you can at a later date migrate master instances to a separate server to increase capacity. We went a little further and added a slave instance for each master to ensure reliability. An important note here is to make sure the slave is on a different server than the master. After some configuration and testing, we were ready to go live.

Sample configuration and setup

We’ve included a simple network diagram below and some example configuration files to help you get up and running as quickly as possible using DisRedis:

Example disredis server configuration

Example disredis server configuration

Example sentinel configuration file for a sentinel running on port 26400 managing 4 Redis server instances (2 masters and 2 slaves):
[code language="bash"]
port  26400 # Sentinel Port

sentinel monitor redis-Redis1 redis01.server.com 6381 2
sentinel down-after-milliseconds redis-Redis1 60000
sentinel failover-timeout redis-Redis1 900000
sentinel can-failover redis-Redis1 yes
sentinel parallel-syncs redis-Redis1 1

sentinel monitor redis-Redis2 redis01.server.com 6382 2
sentinel down-after-milliseconds redis-Redis2 60000
sentinel failover-timeout redis-Redis2 900000
sentinel can-failover redis-Redis2 yes
sentinel parallel-syncs redis-Redis2 1

sentinel monitor redis-Redis3 redis02.server.com 6381 2
sentinel down-after-milliseconds redis-Redis3 60000
sentinel failover-timeout redis-Redis3 900000
sentinel can-failover redis-Redis3 yes
sentinel parallel-syncs redis-Redis3 1

sentinel monitor redis-Redis4 redis02.server.com 6382 2
sentinel down-after-milliseconds redis-Redis4 60000
sentinel failover-timeout redis-Redis4 900000
sentinel can-failover redis-Redis4 yes
sentinel parallel-syncs redis-Redis4 1
[/code]

We recommend using a tool like supervisor to manage your processes for easier management and to ensure they’re always running.

Example supervisor configuration for a single sentinel instance:
[code language="bash"]
[program:sentinel01]
command=redis-server <PATH TO SENTINEL CONFIGURATION> --sentinel
user=root
autostart=true
autorestart=true
redirect_stderr=True
[/code]

Example supervisor configuration for a single Redis instance:
[code language="bash"]
[program:redis-6381]
command=redis-server <PATH TO REDIS SERVER CONFIGURATIO>
user=root
autostart=true
autorestart=true
redirect_stderr=True
[/code]

How to get it

DisRedis (a.k.a. disredis) is now available from pypi (see https://pypi.python.org/pypi/disredis/ for details and additional instructions on setup, use and troubleshooting with Django). You can simply install it with pip:
[code language="python"]
pip install disredis
[/code]

It is also available on our open source github account at: https://github.com/SawdustSoftware/disredis.

We hope you find it useful. Feedback, forking and contributions are welcome.

Written by Jim

Piggybacking on backbone.js for performant mobile web

Published June 16, 2013

CustomMade is a pretty data- and interaction-rich site. Not only do we serve up pages with lots of dynamic content; we also have many interfaces that manage highly interactive experiences (e.g., negotiation between buyers and makers, two-side ideation and collaboration, simultaneous upload of portfolios of dozens of portfolio images). To handle this more cleanly, @Brendan advocated at the end of last year that we begin moving to backbone.js.

Since then we have been extremely happy with backbone. It has helped us out—faster and in more places—than we originally hoped (e.g., untangling management of large sets of asynchronous callbacks when uploading large numbers of images at a time). I’ll defer a deeper discussion of this to future posts by Brendan as this post is about how we piggybacked on backbone to improve our mobile web architecture.

Enter mobile web

A few weeks after we started our work in backbone, we began to optimize key areas and interactions on CustomMade for mobile. We (very) briefly debated mobile web vs. native apps. However, we saw very quickly that mobile web was the best match for our customer use case:

A customer sees something interesting that he or she desires (say a cool table in a restaurant), then whips out a smartphone, goes to custommade.com, takes a picture and posts request to get the same table (but larger to seat the entire family back home), and starts receiving offers from makers to get it.

Pausing to first download an App (even a PhoneGap’d HTML5 app) does not fit this use case. However for mobile web to succeed, it has to be FAST (no one is going to wait around for a transaction to complete with their phone hanging out).

Native mobile app interactions are fast (even when they need data from remote servers) because the MVC architecture controlling the interaction is embedded in the device itself. This is different from traditional browser interaction that needs to pass arguments to a remote server to determine what to display next. Solving this is where backbone helped again. Backbone loads an MVC-like architecture client-side in the browser. I use the word “MVC-like” because there is some debate as to whether backbone provides a true MVC architecture or simply emulates elements of it. Purisms aside, the result is unchanged: backbone moves (traditionally) server-side structure from servers far away to the browser in your hand:

mobile-arch-comparison

Backbone loads key structures (models, routers, and views) into the browser on page load, enabling fast navigation and more dynamic interaction

The result: performant mobile web

As a result the experience of our mobile web experience is virtually just as fast (and interactive) as that provided by a native app (without the headache of getting a provisioning license, building in a closed ecosystem, submitting for approval and requiring download and installation). Views are dispatched within the browser, allowing users to progress forward and back without pinging a remote server (this even works in staging when we shut off the staging servers;) Run loop polling (which backbone does efficiently, even over mobile connections) automatically notifies customers of changes in project status and receipt of new messages:

mobile

Examples of progression from inception of project idea to receipt of a proposal by a maker. A future post by @Manning will highlight his wizardry with HTML5 and Sass to emulate iOS’s UIKit framework. In addition, we’ll share our experiences on use of mobile-specific templates vs. responsive web

Caveat

This approach does not allow for completely offline operation. It does handle interruptions in connectivity but ultimate needs to connect to our backend servers to get and post data. If we needed offline operation (for extended periods of time), without losing any data, we would instead build a native mobile app with an embedded SQLite database. However, that is not our use case

Written by PK

From Our Engineers: CustomMade and $18 Million

Published June 13, 2013

Re-blogged from PK’s personal blog. Originally published 11 June 2013.

What can CustomMade do with it’s new round of financing, 18 million dollars? a lot.

I am lucky enough to experience the inside view of a large funding round like this for the second time. The first one, AdvisorTech Corp, was during the Internet Bubble years. 20 millions were nothing to talk about. This time around, the market conditions are much more realistic, and the CustomMade round is well deserved and it is a vote of confidence by the markets. One of the main success factors for CustomMade is the pairing of co-founders Seth Rosen and Mike Salguero. They have complimentary skills in a way that I have not seen for a long time.

Another factor of CustomMade’s success is the first mover’s advantage. While being first does not guarantee success, being first and having a team that have worked and worked to understand the customer gives CustomMade a tremendous advantage over any competitors. Any two sided market place business is difficult to understand. Which side of the market should subsidize the transaction (makers)? How to deal with competition within one side (how to encourage maker participation without shrinking the maker pool by favoring high performers)? How to match make between the two sides (customers and makers)? The CustomMade team has built up a lot of internal knowledge of how to make this market work.

But remember, to quote Mike quoting Seth:

A dollar raised is a dollar not earned — Seth Rosen

This is a beautiful insight into the truth about startups — Having raised this large round of financing just means that we are in the hot seat to deliver value to the investors by multiplying those dollars into revenue growth.

Here is a picture of @pks, @MoonlightLuke and @markstenquist working hard with their pen and paper… (We were signing forms for a welding class, to understand how to custom make objects!)

custommade_artisan_developers

Written by Mike & Seth, Our co-founders

From Our Founders: Big news at CustomMade

Published June 11, 2013

Guest Sawdust Software posting from our founders. Reposted from the CustomMade Blog

Many people don’t know that the both of us were originally Buyers and lovers of CustomMade before we were Owners. When we purchased CustomMade in 2009, it was a small and static website with 350 woodworkers, but we saw the potential for so much more. Today, CustomMade helps more than 12,000 professional Makers of all trades and attracts millions of Buyers in search of custom goods. We’ve believed in the power and future of CustomMade for years and we’re pleased to announce that we have taken another big step to help make custom accessible for the masses.

We’re thrilled to say we’ve received $18 million in funding support from another group of CustomMade believers. Our financing round was led by Atlas Venture and returning investor, Google Ventures, with participation from Schooner Capital, Next View Ventures, First Round Capital and Launch Capital.

So, you’re probably wondering what this means for you. It’s pretty simple actually: You want to enjoy high-quality custom goods that celebrate individuality, reduce waste and increase the business of local Makers, all at a good value. We want to help you, and even more people, do all of that in an even easier way.

Succeeding in that requires incredible people and talent so we can deliver a flawless experience to Buyers and Makers alike. A lot of complicated work goes into developing and maintaining a website so our senior vice president of engineering & operations, Jim Haughwout, is actively seeking out passionate engineers to build creative code that makes CustomMade look amazing and work great.  We’re building our product teams to help design a great online experience for both Makers and Buyers, and our marketing team to spread the word about buying custom.

We’re definitely keeping busy so stay tuned for a bunch of exciting updates in the coming months. We can’t thank you enough for your continued support and belief in a custom life.

Mike & Seth, CustomMade co-founders

Written by Mali

The phantom query

Published June 7, 2013

This is a debugging war story of how the sum of a few small problems can conspire to cause much larger headaches than each of the parts.

It all started when I was trying to optimize our project listing pages, since our users love browsing so those get hit the most. Usually, database queries take the longest so I pulled out the trusty django-debug-toolbar and dug in.

After fixing a few things here and there, I stumbled upon this one that took around 70ms:

[code language="sql"]
SELECT ... FROM `project`
INNER JOIN `user` ON (`project`.`user_id` = `user`.`id`)
LEFT OUTER JOIN `maker` ON (`user`.`id` = `maker`.`user_id`)
WHERE (`project`.`user_id` = ... )
ORDER BY `project`.`order` ASC
[/code]

This snippet basically brings in more project listings from the same maker. Which would be all fine and good except we’d moved that functionality out into a separate AJAX call a while ago!

 

Tracking it down

I checked the page, AJAX call is definitely there. I checked the template, no reference to any of this. What the heck?

It must be in the view. Must be left over somehow. I refreshed the page a few times and the query kept showing up, which means it wasn’t cached. Cool, that narrows it down. I checked the parts of the view that didn’t have caching code. No trace. What?

Eventually, I found part of the code that called a utility function which returned some common info about the maker. Among those, the QuerySet for the project listing. Easy peasy.

But wait, the query turned out to be in the cached section! So how come it kept happening even after my first page refresh?

 

The plot thickens

At this point, I could have ripped out the queryset and called it done. I tested that, and it worked. But no, at this point, it was personal.

Could it be that it isn’t getting cached at all? That can’t be – we’d have way more server load. Hmm, this variable isn’t getting used anywhere, and Django QuerySets are lazy, right? So why is the db query getting executed?!

Time to enable logging on memcache. Pipe it through grep to get the right key, reload the page:

[code]
<28 get ...
> NOT FOUND ...
<28 set ...
> NOT FOUND ...
[/code]

Did a get, then did a set. Good. Reload again:

[code]
<28 get ...
> NOT FOUND ...
<28 set ...
> NOT FOUND ...
[/code]

Not found again. What?!

Shouldn’t it have been cached from the first time? So maybe it isn’t caching right.

Stepping through the code, yup, the cache.get() fails and the cache.set() definitely gets called every time. I remove all other items and just set the QuerySet. Does the same thing. Remove the QuerySet and add everything else, works fine.

After a bit of googling, now that I knew what the problem was, I find out that memcached will error if you try to put more than 1 MB of data into one key. Which is a code smell in and of itself because in most cases it’s probably abnormal to store 1MB in one key, but why didn’t it error?

Digging into the django and python-memcached code, looks like python-memcached returns zero on failure, but the django caching framework throws it away. Duh. Turns out people have complained about this in the past, but no avail, the main reasons being API backwards compatibility (heck, you could have just a logged warning or only thrown an exception in debug mode, or scheduled for release in the next major revision) and that getting errors would require another trip back from the server (dubious in this specific case).

Bah humbug.

 

Epilogue

Don’t assume code will behave in the way you imagine is sane, since “sane” has different definitions for everyone. Don’t assume bugs always have a single simple cause. If error cases are not documented, beware. And if your library is going to ignore errors silently, you’d better scream about it in the docs!

 

If you found this interesting, take a peek at our engineering pagewe’re hiring! Check out our jobs on our careers page.

 

Written by Jim

Interesting Challenges + Perfect Timing = Big Opportunities

Published May 28, 2013

At CustomMade, we are building a two-side marketplace that enables anyone to get virtually anything they can imagine, made exactly how they want, from the ideal “Maker”. Making this a reality poses some really fun (fun if you like making hard things simple) engineering challenges to tackle. This post describes some of these:

Knowledge Graph: CustomMade currently has over 100,000 135,000 listed projects representing creations by over 10,000 12,000 Makers. These projects can be virtually anything: from wooden bowties to R2D2 engagement rings. How do we build an underlying knowledge graph for 1:1 commerce that lets us—automatically: match customers and makers based on interest and style, automatically bid on the ideal AdWords for acquisition, detect unmet demand for new type of artisans, and much more? How do we codify personal things like style and skill? How do we make this extensible to anything product?

Search, Recommendation and Machine Learning: Creating custom items is deeply personal. Simply matching customers and artisans based on keyword and text relevance is not enough. How do we create matches that factor in personal taste, skill and style? How do we combine this with Search to provide results that only match relevance, style and taste, but also factor in real-time statistics on marketplace performance? How do we allow our platform to “learn” from user interaction (not just transactions but social sharing, page hold time and more)?

Simple Ideation Collaboration and Co-creation: Making it easy to “create anything you imagine” is very hard. What user interfaces (design and technology) make it easiest for customers to express ideas—while still conveying enough information to make it easy for Makers to understand what they need? How do you enable customers and makers to answers questions and collaborate on ideas—with the same intimacy of texting—across thousands of miles?

Cross-Platform Front End Engineering (Web, Mobile Web, Native Mobile): How do you build experiences that cater to the different capabilities of web (where typing is easy and screens are big), smart phone (where time is short and typing is hard but taking a photo or audio note is easy) and tablets (where browsing and exploration in front of the TV is common)? What combinations of Native Mobile and HTML5 technologies provide the ideal balance of speed of development, fast iteration, and fast user experience?

Payments: Secure payment processing is the hallmark of any marketplace. Not only are we tackling the standard challenges of transactional integrity and security; we are also creating payment models unique for a two-sided custom marketplace? How do you structure payment processing to buy items that do not even exist yet? How do you do this in a manner that provides Trust & Safety guarantees to both customer “Buyers” and artisan “Makers”? How do make this clear, simple, secure and scalable—across currencies and international borders?

Continuous, Linear Scaling: This year all aspects of our platform are growing. As we build for 100x scaling, how do we modify our platform architecture to enable us to continuously scale while providing a faster and faster experience for our customers? What caching approaches best balance speed with the dynamic nature of our user-generated content? Which sharding strategies best match our usage patterns? How do we implement these without interrupting users (what we call “changing out parts of the rocket mid-flight”)?

Right now is a really great time to join CustomMade. We’re big enough to have lots of traffic and rapidly see new ideas translate into conversation rates and revenue. However, we are still small enough to innovate quickly. We are seeking talented engineers to join us on this journey. If you’re interested, check out our job openings in the Boston and London metro areas.

Written by Jim

Welcome to Sawdust Software

Published May 21, 2013

Welcome to Sawdust Software, the CustomMade engineering blog! At CustomMade, we like to say that we code for those who create:

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures

Yet the program [], unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. […] The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Fred Brooks, The Mythical Man-Month, Page 7

Inspired by the skill and artistry of our 10,000+ 12,000+ makers, we’re working to build the platform to let you get anything you can imagine, just the way want, created by the ideal person. This is not an easy challenge. However it is a fun one, with lots of interesting challenges (more on this in later posts).

Sawdust is the result of creation. At Sawdust Software, our team will share the results of their creativity and effort, in their own words (they picked the name of the blog themselves). By doing so, we hope to give you insight into who we are, what it is like to work with us, and why were all so excited by what we do.