Using Scatter Plots to Assess Building Performance–Part 3

Up to this point, I have been looking at how you can use scatter plots to assess building performance in the context of building energy data.  I have focused on that because it can be a really powerful to help us understand how a facility might use energy in the future based on how it used it in the past, including past history generated by a functional test of some sort that temporarily modifies the performance of a system or systems in the facility.

In this post, I want to focus on how you can use similar techniques to assess system level data.  But before I go there, I thought I would mention a couple of other things related to what we have been discussing.

You Are Doing Regressions

In general terms (and a lot of what follows will make real scientists cringe I suspect), the Excel trend line feature is doing a regression analysis.  My (limited) understanding of what that means is that Excel is kind of saying:

I know what was happening at point A and at point B and at point C, so I am going to assume that what is happening at point D is related to that based on a mathematical relationship.

Again, this interpretation could potentially make real scientists shudder.   But I think in the bigger picture, it is generally correct.

The reason knowing this matters is that you can now use the word in meetings and reports  to sound impressive and important.  Instead of saying:

I projected the energy consumption for the changes we plan to make

You can say:

By performing a sophisticated regression analysis, I have extrapolated a data set representing the predicted energy utilization patter after improvements are implemented based on our baseline data set

Hopefully, you realize I am kidding around a bit.  But it is true that using this technique brings a new level of sophistication to your work, so it doesn’t hurt to mention it.

But its also important to understand the constraints associated with this technique if you plan to use it.  One of the constraints for the approach we have been discussing is that the equation generates a line, not a cloud.  And that line predicts what the dependent variable will do  based on what the independent variable is doing.

Most of the things we work with in building systems have a whole lot more than one independent variable.  That’s why we have a cloud of data, not a line of data.  And that is that the “R2” metric is about.

Visualizing R2

I mention “R2” in the video (a.k.a. the coefficient of determination) and that the closer to 1 it is the better your prediction will generally be, all other things being equal.  At one point in my career, I was trying to get a visual on what that meant and I tried something that was helpful to me. 

Specifically, I was working with a facility that had a cloud that appeared to have a shape, but the cloud was not particularly tight.


When I added a trend line equation did not have a particularly high  R2 value.


As a result, I was trying to understand if I should consider using the equation or not.  Specifically, I was trying to understand how close the points in my cloud were to projecting what happened plus or minus 10%, which I considered to be a reasonable degree of accuracy given all the variables that come into play.  In other words, were a lot of the points not exactly on the line but pretty close, or were they spread out uniformly through the extent of the cloud.

So, I used the equation to predict the consumption pattern, which, of course, gave me a line of data points that sat right on top of the trend line.   Then, I plotted lines that were plus and minus 10 and 20% of the calculated value, which gave me this result.


While not perfect, my  little exercise gave me a visual that helped me feel comfortable using the equation to predict energy consumption since most of the dots seemed to lie with-in the error band.

Mathematical Correctness vs. Engineering Judgment

It is important to recognize that the approach I used was not mathematically/scientifically correct in terms of assessing the curve fit.  It was more of a gut level measure of assessment.  For instance, I did not know if each dot represented one occurrence or multiple occurrences stacked on top of each other.  My guess, based on past experience was that the remote points were true outliers.  In other words, the 470 kW point at 58°F really was only a handful of events, not 1,000 events stacked on top of each other.

So my conclusion that it was reasonable to assume that a projection made using the equation would probably provide an answer that was in the range of plus/minus 10% relative to reality was simply a judgment call I was more comfortable with having drawn the lines.  But it was not mathematically/scientifically correct.

If this was a data set showing the incidence of humming birds dying of lung cancer as a function of the average annual air quality index for the city they flutter about it, I would not have been as comfortable using my judgment call technique because I don’t have a clue about reality of the data having not worked with it.

Incidentally, after I thought of my shading trick, I tried it on this data set and it supported my judgment call.


Clearly the dark band is with-in the plus/minus 10% window and the 470kW/58°F point is almost invisible (meaning only a few events at that condition).  But this is still a qualitative assessment, not a mathematical/scientific assessment.  The shading simply adds another dimension to the data to help in making the judgment call.

The bottom line is that in the business we are in, we make judgment calls all of the time based on past experience and the available data.  Tools like Excel can help us with those judgment calls, both qualitatively via graphing techniques and quantitatively via mathematics, sometimes mathematics that we may not be familiar enough with or have the time to execute manually with out the help of the spreadsheet.  There is nothing wrong with doing these things as long as we recognize them for what they are and don’t make the results out to be more than they are.

Applying Scatter Plots at the System Level

In the following video clip, which is the last one in this series, I demonstrate a number of excel techniques I use to either support scatter plots or that allow me to use a scatter plot  to assess savings or perform diagnostics including how to use the Excel function VLOOKUP to correlate information from one data set with another.  For instance, you can use this function to correlate an outdoor air temperature from an hourly weather data file with one minute data that you might obtain from a data logger or energy management system trend.

I also illustrate how to take some basic field data liked logged temperature difference and a steady flow rate and apply a common HVAC equation in a spreadsheet to calculate the minute by minute load and then use that data in a scatter chart to develop a relationship that will allow you to project the load for conditions other than what existed when you logged data.

Obtaining Hourly Weather Data

I mentioned three ways to come up with hourly weather data in the video.  Here they are as hyperlinks to make it easier for you to find them.

  • Bill Koran’s Weather Data Query Tool:  the tool is on my Google Drive at
  • The National Weather Service and Similar Resources:  I have done a number of blog posts about how to obtain hourly weather data as well as other climate resources like the temperature and precipitation nomographs I use in the video.


  • Electronic Psychrometric Charts:  Typical Meteorological Year (TMY) data can be exported from a number of electronic psych chart like Akton chart I used in the video or the Greenheck chart with the “Pro” upgrade.  The basic version of the Greenheck chart is free and allows you to plot points, and read data.  It is a subset of the Akton psych chart that has been focused on HVAC applications.  The upgrade is a good value for the money and will provide you with a number of useful tools including the TMY files and AHSRAE design data.

Correcting for Density

The data set I am working with in the video happens to be for a system in Denver Colorado, so, as you observed in the video, I had to correct the unit’s conversion constant in the sensible heat equation for the lower density air in Denver.  So, I discuss that a bit and then use a little tool I made to come up with the correction factor.  I have posted a version of the tool on my Google Drive and you can link to it if you want from the side-bar on the home page of the blog.


Thinking Outside the Box with Excel and Scatter Charts

To some extent, applications that use Excel techniques and scatter charts as a building science tools are only limited by your ability to think of ways to apply them.  For example, the data set that was featured in the previous video was only a starting point.  That data set allowed us to project how much energy one of five dysfunctional systems was using.

The fact that the projection said that given how the system was functioning, it would use preheat until it was about 90°F outside when it should have stopped using it when it was 70°F outside opened the door for a way to project the savings we could achieve if we solved the problem.

One approach would have been to find the equation for a parallel line that intercepted the x axis at about 70°F.  Since the preheat load is fairly linear, you can do that by making an adjustment to the y intercept.  You can determine that adjustment from the existing equation and your desired x intercept.  But if you are math phobic, that may not sound all that attractive and the reality is there is another way that is based on what amounts to a simple functional test.

The alternative approach involves understanding why the system was using more preheat energy than it needed to, then correcting that problem and measuring the results.  Once you have collected the data, you can compare the two data sets to asses the savings.

In this particular case, the issue was the result of some major issues with the control system.  Since there were four identical systems, Chuck, the engineering manager, decided he would simply fix one system so we could log data after the repairs and project our savings that way.   This approach also would validate our conclusion regarding the reason for the excess consumption and pave the way for applying the fix to the other three units.

So we proceeded in that manner, logged more data, generated another cloud and an equation for the post-repair data set.


We then applied the equations to bin data for the location to calculate energy consumption as found …


… and after the repair.  The difference is the savings potential.


That is a pretty firm number since it is based on measured performance for the system over a range of conditions that were typical of the range that is seen most of the year in terms of outdoor air temperature, which is the prime driver for consumption in this case.

A Caveat’

One important thing to remember if you use the approach I just mentioned to assess savings.   If you are working in a utility program you want to baseline the dysfunctional performance before you make any changes.   And then you should make the changes via temporary measures, not by permanent fixes like Chuck did.

The reason for this is that the idea behind a utility incentive program is to provide funding for energy conservation work that an Owner would otherwise not undertake.  If you make permanent fix and then go ask for the incentive money, the utility may say something like:

It sure seems like we don’t need to incent you to do this improvement, you have already made the change, thank you very much.  Next in line please.

That doesn’t mean you can’t demonstrate the savings by improving the performance of the system and measuring the results.  It just means you need to use temporary, manual methods to demonstrate the improvement.

For instance, the big reason for the excessive preheat load in the example above was that the chilled water valve had failed fully open.  So the preheat process was compensating for a wide open chilled water coil that had a flow rate that was in excess of design going through it all of the time that was doing about 20°F of unnecessary cooling.

Chuck helped us demonstrate the savings potential by fixing the chilled water valve in addition to recalibrating or replacing marginal sensors, fixing interlocks, and tuning the control loops.    But we also could have demonstrated the savings potential by simply closing the service valve to the chilled water coil, making sure the controller for the preheat coil was working reasonably well, and logging the results.

Obviously, we could not use this approach if it was hot outside, otherwise the area served by the system would overheat.  But if we did it during mild weather, we could demonstrate the savings potential, establish the data cloud, and project the savings, just like we did using the data we collected after Chuck made repairs to the system.


David Sellers
Senior Engineer – Facility Dynamics Engineering

Click here for a recent index to previous posts

This entry was posted in Excel Techniques, Retrocommissioning Findings. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s