When working with and implementing BigQuery, there’s a number of small problems to which I couldn’t find documentation to, or a working example of a solution. This occasionally happens with any database. While these won’t probably be groundbreaking problems to solve, hopefully it’ll make someone’s day a little easier. Sometimes, it’s the little things.

BigTips: Removing duplicate rows with mixed static and variable columns, while keeping row version histories!

I Know What I Need, Just Point Me To The Scripts!

The Problem Statement

One of the age old problems in analytics data is how to deal with late arriving data. This is a fairly common problem when ingesting data, especially with systems that deal with frequent transactions. …


When working with and implementing BigQuery, there’s a number of small problems to which I couldn’t find documentation to, or a working example of a solution. This occasionally happens with any database. While these won’t probably be groundbreaking problems to solve, hopefully it’ll make someone’s day a little easier. Sometimes, it’s the little things.

BigTips: Generating random numbers in a range, and random dates.

This one will be pretty quick, and I’m filing under, “Why Isn’t This Simple Thing a Thing?”

Random Numbers In A Range

This whole thing really started when I needed to generate some test data, and noticed that BigQuery’s RAND() function doesn’t have upper and lower bounds. It’s not a complex thing, but one of those minor things that’s just easier to copy and paste from somewhere, so here it is so you can just copy and paste it. …


When working with and implementing BigQuery, there’s a number of small problems to which I couldn’t find documentation to, or a working example of a solution. This occasionally happens with any database. While these won’t probably be groundbreaking problems to solve, hopefully it’ll make someone’s day a little easier. Sometimes, it’s the little things.

BigTips: Make working with BigQuery’s INFORMATION_SCHEMA a little bit easier by flattening the tables!

This is a follow up to my earlier post on flattening the views to get a consolidated view of a dataset’s metadata. …


Update: I goofed on the first go around and forgot to join with the base INFORMATION_SCHEMA.TABLES view to get tables and views that don’t have options. The queries and screenshots have been updated with this fixed.

When working with and implementing BigQuery, there’s a number of small problems to which I couldn’t find documentation to, or a working example of a solution. This occasionally happens with any database. While these won’t probably be groundbreaking problems to solve, hopefully it’ll make someone’s day a little easier. Sometimes, it’s the little things.

BigTips: Make working with BigQuery’s INFORMATION_SCHEMA a little bit easier by flattening the tables! …


Get started today with a hands-on tutorial of BigQuery and the CARTO BigQuery Tiler.

Created with my friend, co-presenter, and fellow data hacker, Julie (twitter), and with help from CARTO!

Hello, world!

Recently, CARTO introduced the BigQuery Tiler and we had the opportunity to present at the Spatial Data Science Conference 2020 and give a glimpse of what the tiler is like.

We gave a talk!

We created a couple demos for this presentation, and we wanted to share it out as a bit of a “hello, world” exercise for the tiler! We have loaded the data and the queries/code onto a Github repository and you’re welcome to give it a whirl. When building these for the first time, we found a lot of good information on a lot of the different components, but it wasn’t always straightforward trying to tie them all together, so this includes a lot of lessons we learned along the way, and gives a full end to end example from data creation to visualization. …


This is part of an ongoing series, and there’s other goodness lurking out there! All the fuzzy goodness!

Okay, I know in the last article I said we would use this one to start going into adding address elements and match groups (and I promise they’re still on their way), but I wanted to take a slight detour and add another set of functions before we get there. …


Don’t We Already Have Street Addresses?

A couple weeks ago I attended the Google’s Geo For Good Summit 2019. It was an amazing experience, and I can’t say enough good things about the people I’ve met, the work everyone’s doing, and the conference itself.

When I go to conferences, I pick at least one session at random where the topic is unfamiliar to me. We all spend so much time focusing on bolstering our areas of expertise and sometimes miss out on so many other great topics being covered. This is how I work on branching out.

I chose “Digital Addressing with Plus Codes,” and it ended up being the session that stuck with me the most. …


This is part of an ongoing series, be sure to check out the other articles!

In the first two articles, we were talking about Soundex and the wide net that it casts which leads to many false positives. The problem has been around for some time, and as Soundex has shown that it comes with compromises, engineers have been looking for ways to tweak the algorithm to encode for different uses. …


In the first post on this topic, we went over how to build a series of user defined functions (UDF) in order to implement the Soundex algorithm. To recap, the idea here is to build a toolbox that can be used to solve challenges around fuzzy matching and master data management. Now Soundex is the first one that we have!

Ever since I wrote that post, I have been inundated with an email asking about the performance aspect of it. One of the good things about Soundex is that it’s fairly quick. …


“That’s Not How My Name is Spelled”

Getting a name right is important when making decisions around people. This holds true when dealing with all aspects of how a company engages with a person. Use cases in human resources, targeted marketing, and customer engagement are examples where getting the name right matters.

Image for post
Image for post

Just think back to the last time a barista wrote your name on a cup.

When it gets into the context of enterprise systems, the need to match similar names is even more important. When a single transaction happens in isolation, there might be a bit of tolerance for error. When it says “Bryan” instead of “Brian” on my cup, I can let it go (usually). If I were to appear in a CRM system in multiple permutations, this becomes a bigger issue. …

About

Brian Suk

Avid 2020 bed-to-couch traveler, cloud tech, big data, random trivia, Googler. These are my random ideas, my employer isn’t responsible for what’s here. NYC.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store