WRF, or Weather Research and Forecasting model, is a system for numerical prediction of weather. I had to become familiar with the system because it was needed in one of the projects in my PhD. In this post I’ll describe how to configure the model for a simple forecasting run.
The input data
To run a weather prediction in WRF one must take several steps. First, we have to get the data. We need geographical data, which has the information about the land. The height, soil type, land use and other. This data is independent of time and can be easily downloaded from the WRF website. The uncompressed size is around 30 GB.
The second set of data needed is the meteorological data. This data gives the snapshot of weather conditions at one moment in the time and therefore we need to have the data over the entire period the simulation is running. Although it might seem a bit strange, we need the data including the period we want to predict. The reason is that the model needs to know the weather that is coming to the modelled domain from the surrounding area. The only time we can have a model running without the weather data is when we are running a simulation of the global weather. In this case, there are no boundaries since the earth is round and has no edges.
Past weather simulation
Where to get the data? If the model run is in the past and the goal is to get atmospheric variables for a time period in the past, then we can get the boundary data from reanalysis. This data source has been retrospectively compiled from all available real measurements across the globe and should provide us with the most precise simulation. For example for a simulation over North America one option is NCEP North American Regional Reanalysis (NARR).
The disadvantage of this dataset is that it is not usable for realtime forecast since this data is available only for past dates with a long period of delay.
A use case for a simulation over this data might be a study of past wind patterns to optimise the location of a wind power turbine.
If the goal is to do
- a real-time forecast (e.g. to be used in a decision support system) or
- a simulation of a forecast (e.g. to determine if a forecast has sufficient accuracy using forecast made for past dates and compare it with real measurements from these dates.
then we have to provide the system with input data that is available at the moment we wish to initiate the forecast. Obviously, we cannot use the reanalysis data since it is available with a great delay. We also cannot use the real measurements since they are simply not available for future dates.
The boundary data that is used for this kind of computation taken from a global forecast. Global forecast does not require the boundary condition since it simulates the entire globe. The data source for this global run comonly used with WRF is Global Forecast System (GFS). This data is available 4 times a day with the forecast 192 hours ahead.
Why cannot we use the GFS data as the forecast? Theoretically, we can. The disadvantage is that the time-step of this data is 3 hours and the scale of the grid is 0.5º. If a higher resolution is required, WRF has to be used to take this data to initiate a model and then simulate the model on the required scale and timestep.
For North America there is a model with higher resolution called North America Mesoscale (NAM). The data from this model is released 4 times a day with the prediction going 84 hours ahead. This model has grid spacing of 12 km.
A basic WRF run needs two sources of data:
- geographical dataset (time independant, contains the land parameters)
- Can be easily downloaded from WRF website
- Is the same for all simulations
- meteorological dataset (depends on time, contains the snapshot of the weather at one moment)
- there are several types
- realtime datasets with forecasted periods
- datasets from the past with integrated real-world measurements
- has to be available for the entire simulated period of time (including the forecast time)
- there are several types
The tricky part to understand is that input data for the boundaries over the forecasted period is needed. For example, if I want to forest the weather for tomorrow I have to give the model the boundary conditions for tomorrow. This is so that the model knows what kind of weather is coming to the simulated domain from the side. This input data is typically obtained from a global weather forecast, that forecast the weather on the entire planet.
There are also other types of data that can be used in the model. I will be focusing on ingestion of local observation data in one of the following parts of this WRF series.