My writing

Ramblings of an open source business owner.

AI-free writing from the maintainer of Beekeeper Studio and Tabulator, both sustainable indie open source businesses.

2019 Should you use Parquet? Parquet provides significant benefits for sparse reads of large datasets, but is it always the file format to use? Hadoop 2019 Beginners Guide to Columnar File Formats in Spark and Hadoop File formats can be confusing, so lets delve into Columnar file formats (like Parquet) and explain why they're different to regular formats (like CSV, JSON, or Avro) Hadoop 2017 A Quick Guide to Concurrency in Scala I'll talk through the basics of Threads, Akka, Futures, and Timers in this quick overview of concurrency for Scala. Great for those building apps in Scala. Scala 2017 4 Fun and Useful Things to Know about Scala's apply() functions Scala's apply functions are commonly seen alongside case classes, but they can do so much more. Here are 4 fun ways they are used in Scala. Scala 2017 10+ Great Books and Resources for Learning and Perfecting Scala While Scala is amazing it has an overwhelming number of features. These books and on-line resources will help you learn and perfect Scala whether you're coming from Java, Python, Ruby, or any other language. Scala 2017 10+ Great Books for Apache Spark Apache Spark is a powerful technology with some fantastic books. I'll help you choose which book to buy with my guide to the top 10+ Spark books on the market. Spark 2016 An Introduction to Hadoop and Spark Storage Formats (or File Formats) I'll walk through what we mean when we talk about 'storage formats' or 'file formats' for Hadoop and give you some initial advice on what format to use and how. Hadoop 2016 Is it 'MapReduce' or 'Map Reduce'? Confused about whether Map Reduce is one word or two? Let me settle this once and for all. Hadoop 2016 Type-Safe Scalding MapReduce Tutorial - Joining and Summarizing Data Scalding's scala-like API lets us join and summarize data quickly and easily, but beware -- it looks like regular scala, but sometimes behaves differently. Hadoop 2016 Hadoop MapReduce Advanced Python Join Tutorial with Example Code Joining and analysing data in Hadoop using Python MapReduce. I compare this solution to the same solution in other MapReduce frameworks. Hadoop 2016 5 Industry Veterans Pick Their Favorite MapReduce Frameworks Software engineers from Cloudera, Foursquare, Spredfast, JauntVR, and Elondina chat with me about their favorite Hadoop MapReduce frameworks and why they like them. Hadoop 2015 Apache Spark Java Tutorial [Code Walkthrough With Examples] To follow my post implementing a pipeline in regular Spark, I do the same thing with Java. The walkthrough includes open source code and unit tests. Spark