Death by a Thousand Dashboards
Dashboards are the accepted standard for visualizing data and metrics. In this post we go over some common challenges that can arise when the number of users and dashboards start to grow within an organization, resulting in an overcrowded and inefficient ecosystem of metrics and visualizations.
Discrete Choice Models
In this post, we provide a brief introduction to Discrete Choice Models. We examine the core idea of modeling choice behavior as a utility maximization problem and we show how - under certain conditions - this theory gives rise to the familiar Multinomial Logistic Regression model that is commonly used to address classification tasks in Machine Learning applications.
Overstressed Data Teams
Data organizations can easily become chronically busy. However, busyness doesn’t necessarily correlate with their ability to generate positive business impact. In this post, we explore some ways in which data organizations can become overstressed and the troubles that follow.
Career Ladders
A career ladder is a document that clarifies levels, titles, and responsibilities for a given role. It is useful for setting expectations, comparing performance across team members and teams, and providing a roadmap for career progression. In this article we provide a framework for developing career ladders for data roles.
Building and Load-Testing a Machine Learning Service
In this post, we explore some interesting AWS technologies to build scalable Machine Learning services in the cloud. If you are curious to learn more about frameworks such as the AWS Cloud Development Kit and AWS Chalice or about managed services such as Amazon SageMaker and AWS Auto Scaling, this post is for you! For extra fun, we also show how to use the Python library Locust to perform load-tests on a real-time Machine Learning service built using the aforementioned technologies.
Principles for a More Efficient Hiring Process
The job market for data roles is growing and evolving at a rapid rate. However, the hiring process remains stressful for both candidates and interviewers. In this post, we reflect on a few generic principles of interview processes, and some specific tips for data roles. The hiring process can benefit from more transparency, coordination, and access to information for all parties involved.
3 Tactics to Improve your Cluster Analysis
Clustering methods are frequently applied in real-world business applications. While clustering is a conceptually simple task, it is not always easy to evaluate whether the clusters that we find represent meaningful characteristics of a dataset. In this post, we discuss 3 tactics that can be used to improve our ability to discover meaningful clusters and be more confident about our discoveries.
7 Experimentation Pitfalls
Experiments are powerful technical tools that improve the quality of business decisions. They require the integration of several engineering systems and the coordination of business and data functions. Experiments are subject to many pitfalls. In this post we cover 7 common issues that are often overlooked when planning and running experiments, from problems with their design, to statistical issues and more practical limitations.
Measuring and Managing Machine Learning Performance
Today, many of our experiences are powered by intelligent data-driven systems. Now more than ever, Machine Learning (ML) developers must be able to describe what performance and accuracy standards can be expected from their systems, and they must be able to measure whether these standards are met. Site Reliability Engineering (SRE) offers a set of principles and practices that can help ML developers address these challenges. In this article, we describe what SRE is and we discuss how Service-Level Agreements (SLAs), Service-Level Objectives (SLOs), and Service-Level Indicators (SLIs) can help ML developers build better systems for their users.
A Guide to Data Roles
There are many data roles. We describe 4 archetypes: Data Engineers, Data Analysts, Data Scientists, Machine Learning Engineers.
The focus should be on project areas and responsibilities: this means investing in data infrastructure first. Once the data is collected and organized, insights can be more easily extracted to inform business decisions. A structured approach to managing data also enables the development of Machine Learning solutions that can automate business processes and create customer facing services.
Setting Sail
Data Captains is a boutique consulting firm that partners with organizations to help them improve their business using data. We advise others about the opportunities that Data Science, Machine Learning, and Artificial Intelligence can unlock when they are leveraged in a pragmatic and business-oriented way.