This is the first post is a series of manipulating weather data from the UK Met Office. This post will cover opening a single web based text file and then writing the steps to transform the data ready to use in a Power Bi report. Future posts will then show how to adapt the steps to process any file and then how to use with the list of stations to get all the data.
The data for this post is weather station data from across the UK. The met office provide data for about 30 weather stations across the UK which includes for every month of each year the maximum and minimum temperatures, number of frost days, mm of rain, and sunshine hours. Each station has its own file available at a web address.
The data can be found at https://www.metoffice.gov.uk/public/weather/climate-historic/#?tab=climateHistoric
Clicking on a station gives the latitude and longitude of the station plus a link to the historic data. The whole process will be broken down into 3 posts.
- Create steps to transform a single file – This post.
- Add a parameter and create a function from the above steps.
- Using a list of stations use the function to transform and combine all the station files.
Transform a single text file
The initial file we will transform will be the Cambridge station file found at https://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/cambridgedata.txt
- In a new report, click Get Data and select Web. Enter the above URL and click OK.
- The file is not delimited well because of the initial rows, so changing the delimiter to an equals sign, which is not in the data returns a single column of data. We need to do more transformations so click Edit to open the Query Editor
- The first seven rows of data are not required so can be removed. From the Home ribbon select Remove Rows – Remove Top rows. Enter 7 when asked how many and click OK.
- In order to keep your code maintainable and clean I recommend you rename the steps as you go and unless a Changed Type is really required I remove them and add one at the end. So in the current query I have three steps with default names. I delete the changed type by clicking the cross next the step and rename the final step. You cannot easily rename the Source step.
- The data contains — when there is no data and * next to some numbers. I want to remove both of these, so from the Transform ribbon I select Replace Values and replace — with three spaces and * with a single space. I use spaces so the overall structure of the data will remain consistent.
- The first 7 characters of each row contains the year and the next 4 characters contains the month. So from the Transform ribbon I select Split Column – by Number of Characters.
Enter the number of characters and select Once as far left as possible. Click OK to see the year column split off.
Repeat the above steps to split out the month column.
- Renaming split columns can be done by 2 methods. The first and easiest is to add at the end of the query rename each column one by one. This will create a single step to rename the columns. This method requires no code editing so if you are new to Power Bi I’d recommend this method.
Double click on the column heading and type in a new name.
Another method is to edit the formula in each column split and change the split off column to the new name. I personally like the tidiness of this method but I am aware it is more fiddly. Note: Renaming a column that a future step already refers to will break that future step if you don’t change that step as well.
- The remaining data can be split at 8 characters each so can be done in a single step using Repeatedly as the split.
Remember to rename or edit after this step.
- Although I have a year and a month column I want to use a date column for some of the date based functions. So I need a date column based off the year and the month on the same day every month. In order to cope with months of different lengths I decide to pick mid of the 15th.
I could just write the formula to create the date string and then convert, but Power Query comes with an awesome unsung feature of new column from example.
Select the Year and Month column.
The from the Add Column ribbon, drop the arrow down on Column from Examples. This will give you a blank Merged column.
Enter in a date based off the first row of data and press enter, it will guess the formula. First off it gets it slightly wrong.
Enter a second date string on the next row down and it will adjust the formula.
This is a superb way to learn the functions available.
- As a final step, make sure all the columns are the right type and named correctly, e.g. the step created in 9 should be a date and renamed.
- Your data should now be ready to be loaded into Power Bi by clicking Close & Apply on the Home ribbon tab.
The next post will change the above steps into a function which can then be invoked with any station name.
You can find the other posts in this series using the links below:
- Part 1 – Transform a text file (this post)
- Part 2 – Create Power Query Function
- Part 3 – Invoke Power Query to populate a table
I follow and use a number of resources for Power Bi. Here are a few of my favourites, apologies for the ones I’ve missed or not found yet, please let me know which ones.