As part of my job I’m fortunate enough to see some really eloquent pieces of cost analysis here and there. Its genuinely one of the highlights of my day when you see some really clever insight into a problem and sometimes even offering new paths to a solution. This was an article featured on Wired which you can find here.

It would have perhaps been interesting to see the work done on a larger scope of data and without the Duplo. Just to focus more on the standard bricks with a larger data set - but just a really nice piece of analysis nevertheless completed on one of my favourite things.


Lego sets come in all different sizes with different numbers of Lego pieces. Of course bigger sets cost more, but is there a linear relationship between set size and cost? Let’s take a look. Oh, and yes – I did look at this before, but that was a long time ago. It’s time to revisit the data

It’s not too difficult to find data for Lego prices and number of pieces. If you just look on the Lego online store. There you can find both the price and the number of pieces for each set. You can even sort them by “themes” – like “Star Wars” or “friends”

Even though it’s easy to get, I only collected price data for a subset of the themes (mostly because I am lazy). If I put all of this data together, I can get a plot of the set price vs. number of pieces in set. Here is what that looks like:

Let’s look at the linear function that fits this data. The slope of this line is 0.104 US Dollars per Lego piece. Boom. There is your answer. On average, one Lego piece costs 10.4 cents. Also, I think it’s nice to notice that this data is fairly linear.

But wait. What about the y-intercept for this fitting function? The value from the fit is 7.34 USD. That means that for this function, if you had a Lego set with zero pieces in it, it would still cost $7.34 – you know, for the box and instructions and stuff. Yes, I know that there are Lego sets cheaper than $7.34 – this is just the y-intercept for the fitting function.

Now let me point out the three outliers in this plot. Notice that all of these (one from Duplo and two from the City theme) are train sets. Of course train sets are going to be more expensive than a set with the same number of pieces (but not a train) because of the electric motors and stuff.

If you are looking for a “good deal”, might I suggest the Trevi Fountain (21020). This set has 731 pieces for just $49.99. According to the fitting function, a set with this many pieces should cost about 83 dollars.

Suppose I break all the data into the different themes. If I fit a linear function to each of the different themes, I can get both the price per piece of Lego and the price of a zero piece set. Here are the brick prices for some of the Lego themes. The error bars are the uncertainties in the fit parameters.

If you know what a Duplo block is, you probably aren’t surprised that they are the most expensive (63 cents per brick). These are bricks created for smaller kids. They are all large so that you can’t swallow them. It just makes since that they would cost more. The other expensive bricks are the City sets. But this is deceiving due to the high set prices of the train kits. I suspect if you removed these train sets from the plot, it would be a more normal price.

What about the base cost? This is the y-intercept of the linear fit:

Here you will notice that the City theme has a negative base cost. This means that if there were no pieces (on average) in a City set, Lego would pay YOU money. But why is this negative? It’s because of the high price of the train sets. They increase the slope of the linear fit but also push the y-intercept into negative values.

The real bargains are the Architecture themed sets. These have a base cost of only 70.7 cents where as the Marvel themed sets have a base cost of 3.61 USD.