Tidyverse ‘Starts_with’ in M/Power Query

As a heavy R and Tidyverse user, I’ve been playing with Microsofts m/Power Query language included in Excel and PowerBI from that perspective, looking for the functions to make my life easier, developing small code pipelines for my processing and trying to get a smooth, clear and maintainable data manipulation process in place.

The Problem

In PowerBI I have data generated from an API call to HubSpot, which deliveres a json which is flattened as the first step of the process into a table with hundreds of columns. These columns have a pretty regular naming convention, in a form similar to this:

client_notified_timestamp
client_notified_source
client_notified_sourceid
client_notified_value
client_responded_timestamp
client_responded_source
client_responded_sourceid
client_responded_value

The general rule is that the variable is encoded in the first part of the column name string, and that the columns with [variable]_value hold the actual value while the other three columns ([variable]_source, [variable]_sourceid and [variable]_timestamp) contain metadata we don’t really need here.

The Target

If I was using R to do this job (which technically I could, but was not possible because of the context the PowerBI file is going to be used in), I could use tidyverse to do this pretty simply:

dataset %>%
    select(c(-ends_with("_source"),-ends_with("_sourceid")))

Anything that ends with "_source" or "_sourceid" gets dropped, everything else remains. A nice compact, maintainable and clear expression of a ‘rule’ of processing.

The Solution

This is the solution I used:

let
    Source = ...,
    rawData = Source{[tableId="myData"]}[Data],
    removeSources = Table.RemoveColumns(rawData, List.Select(Table.ColumnNames(rawData), each Text.EndsWith(_, "Source ID") or Text.EndsWith(_, "Source")))
in
    removeSources

This code block sources rawData and ‘lists’ the columns matching my requirements ("_source" and "_sourceid") using the logical condition each Text.EndsWith(_, "Source ID") or Text.EndsWith(_, "Source") on the column names returned from Table.ColumnNames(rawData) feeding into List.Select(...). This list is the second argument to the function Table.RemoveColumns(...), which is operating on the rawData again, to finally return only the columns I want.

The Observations

This generally suits the requirements: relatively readable functions, multiple logical conditions operating on the column names that ‘select’ which I want returned in the next step.

It is admittedly a little more verbose than the R I had in mind, and right now I’m not sure if that’s me or just the language. There is some repetition in specifying rawData in multiple places, which I haven’t found a shorthand for if there is one. Parts of it seem only ‘functional-ish’? The construction of each Text.EndsWith(_, "Source ID") or Text.EndsWith(_, "Source")) is pretty object-oriented. Without wanting to sound insulting maybe m is only ‘semi-functional’ in the technical definition of the term?

The Caveat

This is the first m code I’ve really written and my knee-jerk first impressions. I’m sure there is a lot more to this language that I have yet to understand and maybe even come to appreciate.

The Conclusion

Despite these observations I wouldn’t discount the potential of m/Power Query. While many Microsoft tools let you use R baked in, it’s only baked in to the point where you can guarantee R is installed on the machine, and it’s an undeniable fact of data that we have to work with Excel and Power BI in many situations. I’m actually quite looking forward to working with this not-quite-familar ‘functional-ish data language’ in the future. When it’s the tool for the job at least 🙂