My writing
Ramblings of an open source business owner.
AI-free writing from the maintainer of Beekeeper Studio and Tabulator, both sustainable indie open source businesses.
2019
Should you use Parquet?
Parquet provides significant benefits for sparse reads of large datasets, but is it always the file format to use?
Hadoop
2019
Beginners Guide to Columnar File Formats in Spark and Hadoop
File formats can be confusing, so lets delve into Columnar file formats (like Parquet) and explain why they're different to regular formats (like CSV, JSON, or Avro)
Hadoop
2017
A Quick Guide to Concurrency in Scala
I'll talk through the basics of Threads, Akka, Futures, and Timers in this quick overview of concurrency for Scala. Great for those building apps in Scala.
Scala
2017
4 Fun and Useful Things to Know about Scala's apply() functions
Scala's apply functions are commonly seen alongside case classes, but they can do so much more. Here are 4 fun ways they are used in Scala.
Scala
2017
10+ Great Books and Resources for Learning and Perfecting Scala
While Scala is amazing it has an overwhelming number of features. These books and on-line resources will help you learn and perfect Scala whether you're coming from Java, Python, Ruby, or any other language.
Scala
2017
10+ Great Books for Apache Spark
Apache Spark is a powerful technology with some fantastic books. I'll help you choose which book to buy with my guide to the top 10+ Spark books on the market.
Spark
2016
An Introduction to Hadoop and Spark Storage Formats (or File Formats)
I'll walk through what we mean when we talk about 'storage formats' or 'file formats' for Hadoop and give you some initial advice on what format to use and how.
Hadoop
2016
Is it 'MapReduce' or 'Map Reduce'?
Confused about whether Map Reduce is one word or two? Let me settle this once and for all.
Hadoop
2016
Type-Safe Scalding MapReduce Tutorial - Joining and Summarizing Data
Scalding's scala-like API lets us join and summarize data quickly and easily, but beware -- it looks like regular scala, but sometimes behaves differently.
Hadoop
2016
Hadoop MapReduce Advanced Python Join Tutorial with Example Code
Joining and analysing data in Hadoop using Python MapReduce. I compare this solution to the same solution in other MapReduce frameworks.
Hadoop
2016
5 Industry Veterans Pick Their Favorite MapReduce Frameworks
Software engineers from Cloudera, Foursquare, Spredfast, JauntVR, and Elondina chat with me about their favorite Hadoop MapReduce frameworks and why they like them.
Hadoop
2015
Apache Spark Java Tutorial [Code Walkthrough With Examples]
To follow my post implementing a pipeline in regular Spark, I do the same thing with Java. The walkthrough includes open source code and unit tests.
Spark