This is a quick update to my recent post on Bin Weather Data for the United States and International Locations.
First the Good News
Just the other day, I went to the NCDC site I discussed in my post on Hourly Weather Data for Times Gone By to purchase some data for a project I was developing an energy consumption baseline for. I went through all of the steps I discuss in the blog post, but when I got to the point where the site typically puts your request into a “shopping cart” redirects you to a different location so you can purchase the data, I was not redirected.
Rather, a new page opened up with the link to my data. In other words, it appears that the data from the NCDC site is free, at least that was the case for the data I requested, which was for the Oakland Metropolitan Airport in California. So, a bit of good news there.
The other good news is that it appears that the NCDC site has data for locations around the world, not just the United States. I suspect that has always been the case, but I just had not noticed it before. It look like you can pick the data by geographic region …
… or country …
… and a fairly extensive list of countries seems to be provided. My guess is the locations are the same data set used by the ASHRAE Handbook data base that I discuss in the bin weather data post, but I don’t know that for sure.
How Excel Deals with Time and Dates
One thing I realized that I did not discuss in my previous post was how to get the data and time stamp in the NCDC data files into a format that Excel would recognize as a date and time.
Years ago, when I first started working with Excel to do time series graphs, I was mystified about how to scale the time axis. It seemed like there was this strange and very large number that controlled the axis scaling. I finally discovered that Excel uses a serial number to represent time. Specifically,
- The number increments by “1” every day
- 1 was January 1, 1900
- 10/8/2009 is represented as 40,094
- 10/9/2009 is represented as 40,095
- 1 hour = 1/24 = .041667
- 1 am on 10/8/2009 is represented as 40,094.041667
- 1 minute = (1/24)/60 = .00069444
- 1 minute after midnight on 10/8/2009 is represented as 40,094.00069444
Suddenly, scaling an axis became a lot easier. But to facilitate my graphing efforts, I built myself a little spreadsheet that had common values for numbers I used a lot (15 minutes, 30 minutes, 1 minute, each of the hours, etc.) as well as a place where I could type in a date and time and it would give me the correct serial number.
For those who are interested, I’ve put a copy of it on my Google Documents account. It also includes an explanation of the numbering system in a paragraph above the table I show in the screen shot. If nothing else, it can be a starting point for your own version.
Converting NCDC Date and Time Stamps to an Excel Compatible Number
Having said all that, there are still several steps you have to go through to convert the date and time that is supplied in a typical NCDC file to a date and time Excel can work with. There are actually three different things that need to be addressed.
- The conversion of the text string in the NCDC file to an Excel time value.
- Adjusting the time stamp, which is usually Greenwich Mean Time or “Zulu” time to the appropriate time zone.
- Make adjustments for daylight savings time shifts, where applicable.
Breaking up the NCDC Date and Time Text String
When you first receive the NCDC file, it will come as a text file with spaces used as delimiters. So for starters, when you go to open it with Excel, you need to make sure that you have selected the “All Files” option in the dialog box that Excel uses to access files. Otherwise, you might think that the file disappeared on you (I’ve done it more than once myself).
I talk about this and some details associated with opening a delimited file in the post on bin weather data so I won’t bother to repeat that here.
Once you get the file open in Excel, you will notice that the third column is where the time and date information is located.
At first this may look like a big, meaningless number, but its actually the date and time strung together with no breaks in the form of Year, Month, Day, Hour, and Minute.
Once you realize that the number actually has meaning, the trick is to figure out how to get Excel to realize that. I suspect there are many ways to do this, probably most of which are more elegant than mine, but for what its worth, this is how I go about doing it.
First, I insert 9 columns into the spreadsheet right after the original “C” column and label them Year, Month, Day, Hour, Minute, Reassembled, Date Value, Excel Date and Time, and Pacific Time with Daylight Savings Time Adjustment.
The reason for the names will become apparent as we move through the procedure. And, to remind me that these are columns I created vs. columns that came with the file, I give them a color.
At this point, you may want to save the file as an Excel file type, partly so you don’t loose the formulas and other things you will be adding and partly to protect your original data, just in case.
The next step is to write formulas that extract the data associated with the first 5 c columns you added from the data in the original column C. There are probably a number of ways to do this, but I use some of the Excel “LEFT” and “MID” functions.
To extract the year, I use the “LEFT” function and tell it to look at the third column (Column C in this case) in the same row (Row 13 in this case) and extract the first 4 characters. (Note that in many of the screen shots that follow, you can see the formula I am using in the large white rectangle to the right of the Exel formula symbol (fx) and that the referenced cells are highlighted with a colored line around them. And remember, if you double click on any of the images in my posts, in most systems they will open up in a separate window as an image that is larger and that will often let you zoom in so you can read the fine print.)
The links I attached to the names of the functions will take you to web pages that describe the details of using them if you have questions.
Incidentally, these web pages are part of a set of pages on the Microsoft support web site that describe all of the Excel functions. I’ve saved the link to the Excel function reference home page in my favorites bar so I can jump to it quickly when I forget how something work or am looking for a formula that will do something I need to do.
To get the Month, I use the “MID” function. In the screen shot below, the formula is telling Excel to look in cell C13, and starting with the 5th character, extract 2 characters.
I use similar procedures to extract the Day, Hour, and Minute characters from column C.
Putting the Text Back Together the Way you Want It
At this point, you will have separated out the values for the year, month, day, hour, and minute from column C. Now you can a different set of Excel functions to put them back together in a format that Excel will recognize as an actual date, like 01/01/2011 07:59. I do this by using Excel’s “&” operator to build up the text string that I am looking for.
In this screen shot, the formula is saying to take the value in cell E13 (the month) and combine it with a backslash, then add the contents of cell F13 (the day) followed by another backslash, then the value of cell D13 (the year) followed by a space, then the value in cell G13 (the hour) followed by a colon and finally the value in cell H13 (the minute).
The ampersand sign (&) tells the formula to combine the stuff before and after it. The quotes (“) tell the formula to include what ever is inside them.
This process gives me a text string in column I that actually looks more like a date, both to me and to Excel it turns out. But, since Excel is basically a computer program, it would rather work with numbers. So, the next step in the process is to turn the text string into a number that Excel can manipulate. Specifically, a serial number in the format I discussed in the section ahead of this one.
Making Excel Happy; Converting a Date as Text to a Serial Number in the Excel Date/Time Format
You could actually do this in fewer steps than I use, but I use these steps so I can see what is going on and make sure I believe the result. Part of what you need to realize is that the text string we just created is literally that; the cell contains a formula that displays a text string. Even if we format the cell to be a number with decimal places, we still will only see a text string because text (A, B, C, D, …) is not a number (1, 2, 3, 4, …).
To create a number that represents the date and time conveyed by the text in column I, we need to use Excel’s “DATEVALUE” and “TIMEVALUE” functions, which is what I have done in column K. The functions do pretty much what the name implies.
In this screen shot, the formula is telling Excel to look in cell I13 and figure out the serial number that corresponds to the date and then look again and figure out the serial number that corresponds to the time, and then add the two of them together. When I say, “serial number” in this case, I am referring to the Excel serial number convention for date that I discussed at the beginning of the post.
By formatting column K to display numbers as a date and time…
… the result of the formula shows up looking just like what is in column I.
But, if you format the cell as a number…
which is what I did with column J (which is set to be equal to column K), then you see the serial number behind the date. That means Excel can manipulate the value, just like any other number. You simply use the formatting feature to get the number to display as if it were text (column K) or a number (column I).
Applying Your Formulas to the Entire Spreadsheet
All of that sounds like a lot, but the truth is, it took much more time to write about it than it did to do it. And if your thinking that’s great, but I have to do that 8,760 times for my year’s worth of weather data, fear not. All you really need to do is cut and paste your formulas from the first cell in the columns you added to all of the other cells.
There are a few Excel shortcut key strokes that can make speed this process up. But when you do it, be sure you have the relative and absolute references right in your formulas (for this example, most, if not all of them will be relative). And always spot check what you have done before using it.
Adjusting Zulu Time to Match the Time Zone of the Weather Station Supplying the Data
Now that we have the date and time converted to a number, we can adjust it so it reflects the time zone at the station where the weather data was actually collected. We need to do this because the data in the NCDC collection is referenced to Zulu time, as I mentioned above. Making this change is easy once you figure out how many hours you need to add or subtract.
There are a number of ways to do that. One is to figure it out based on your longitude. Since the earth is a sphere, you can express its diameter in degrees, 360 to be exact. Since there are 24 time zones, then in general, each time zone is 15° wide. So, if your longitude was 120°, you would be in the 8th time zone.
Since Zulu time is referenced to Greenwich, England, and the earth rotates from West to East, then 120° west latitude would be 8 hours behind the reference time stamp. So, you need to subtract 8 hours.
If all of that is a bit more than you care to think about, then you can just go to 24TimeZones.com and look it up.
When you click on your area of interest, you get a little summary table that tells you the offset and other useful information.
Since in the Excel convention for representing date as a serial number increments the number by 1, then incrementing it by 1 hour, or 1/24 of a day, would mean that you would add or subtract 1/24 from the Zulu time for every hour you wanted to shift it. Since I wanted to shift the time in my data file to the Pacific Time zone, I needed to subtract 8 hours, which is what the formula in column L does.
Adjusting for Daylight Savings Time
If it weren’t for daylight savings time, we would be done. But in many places in the world and United States (but not all of them) we spend part of the year “saving daylight”. So, if you are going to compare your weather data with other metered data, you need to adjust the time stamps to the local convention. And if that local convention includes “springing ahead” and “falling back” (the way I remember which way to turn the clock when the time comes), then you need to make those adjustments to your weather data.
This is actually not to difficult. You can find out if you need to worry about it by looking at the details in the 24TimeZones.com page for the location you selected if you don’t already know. If you look closely at the 2nd 24TimeZones.com screen shot, you will notice that it tells you both how much the offset is from Zulu time, whether or not Daylight Savings Time is in effect, and when the change happens for the current year.
If you need to know what happened for a different year (it varies from location to location and year to year) , there are a number of web sites like the NASA Eclipse web site that have that information available.
Once I figure out the date and time I need to make the change, I use the Excel search function to find the the correct location in the file. In this screen shot, I searched for a cell with 3/13/11. Note that I Searched by columns and looked for Values, selections I made in the drop-down menus in the dialog box.
Once I find the point where I want to make the change, I adjust the formula to subtract more or less hours. In this screen shot, I switched from subtracting 8 hours to subtracting 7 hours after 2:53 am on March 13, 2011.
I use the same technique to find the point where the time switches back again and then cut and paste the revised formula to the cells between those two locations. I also highlight the cells to make it easier to find them and remind me that I made the change.
And that is all there is to it. Happy data manipulation.
Senior Engineer – Facility Dynamics Engineering