How I Approach Data Analysis

How I Approach Data Analysis

Intro

How often do you look at data? If you run a web service, that might mean Google Analytics, application logs, search histories, or browse logs. Even if you do not, there is open data—public datasets released by governments and large organizations to encourage reuse.

My go-to sites:

For example, the Ministry of Land, Infrastructure, Transport and Tourism publishes vehicle ownership numbers in Excel format. You can view pre-aggregated statistics right away. Even without direct access to proprietary data, you can use open data to add value to services.

Here is the approach I take when I start analyzing data.

Step 1: Inventory the data

You cannot fight what you do not understand, and not every dataset is useful. Clarify why you need the data and which datasets support that goal. Suppose you want to forecast sales—you will likely need past monthly sales and related metrics (customer counts, foot traffic, etc.).

Figure out how to retrieve it, too: raw SQL from your database, Google Analytics exports, or understanding the quirks of open data providers. This exploratory step takes time and is worth the effort.

Step 2: Visualize

Once you know which data to use, visualize it. Raw numbers rarely reveal patterns at a glance. Start with line charts and scatter plots. If a line chart shows a linear trend, you can reason about future values. Scatter plots can reveal relationships even when the data is not linear.

Remember qualitative data exists as well—say, free vs. paid members. Group by category and analyze the quantitative metrics per group. In code, you often encode categories as numbers (paid=1, free=0), but do not treat them like continuous values; the magnitude is meaningless. Visualizations should respect that.

Step 3: Compute statistical indicators

(Again, only for quantitative data.) After visualizing, compute basic statistics. Here are the ones I use most often:

If the data looks linear (x/y pairs), add:

Closing thoughts

Calculating these numbers is just the opening move. The real work is forming hypotheses, testing them, and extracting insights. In the age of machine learning frameworks, anyone can throw data at a model, but without understanding the basics you quickly hit a wall with real-world data. Mastering the fundamentals pays off.