Archive for the ‘ Data Collection ’ Category

OK, so I’ve made a few changes to the database schema. I’ve simplified it a little by taking the stateHistory table out and simply pushing those values into the perfHistory table. Eventually, I’ll be adding a dimensional model data warehouse to this project and I’ll split these types of measures up there instead (so I can track duration of state better). Also, I took out the circular reference to the channel and tied it only to the measure itself instead of the device. I can no longer reuse the channel definitions but that wouldn’t be a very common occurrence anyway.

Here’s the updated schema:

database schema

I also took out the identity field for the history table since it was really unneeded.

Leave your Comment

For a while now, I’ve been working on the database design for this project. I wanted to keep it simple and limit the number of tables. In this first design there are a few columns that aren’t normalized, but it helped to limit the table count.

The design is somewhat like a CMDB in that it keeps track of configuration items, their properties, as well as the type of item. Such things as computers, refrigerators, furnaces, and rooms in the house would be considered configuration items.

So the major parts (tables) to the database are as follows:

Configuration Item

configItem – Again, stores information on things like computers, freezers, and rooms.

Location

location – defines where items such as sensors, config items, and measure devices are physically located.

Sensor

sensor – These are specific sensors located in the house, they may be as simple as a thermistor or as complex as an entire script that lives on a computer.

Measure

measure – These are specific measures and the values associated with them.

Measure Device

measureDevice – This table stores the various physical devices the measures are taken with. For example, the sensor might be a thermistor, but the measure device would be an analog to digital converter that measures the voltage across a voltage divider.

History

(perfHistory, stateHistory) – There are two tables here, one for performance values such as temperature, humidity, free physical memory and the other for state values like light on, motion detected, and door closed or open.

 

So far, I’m not including any aggregate tables for data summarization or any user interface tables…I’ll leave those for later.

Here is the database schema:

image

Comments (1)

Since this project will collect data from many different sources such as data acquisition boards, pc computers, network systems, and weather station data, I’ve debated on building a centralized collection system or a distributed system. After prototyping a collection database and a windows service that would capture all of the data, I thought it may be better to build a distributed system instead.

What would a distributed collection system entail?

Here are some pieces of a distributed for a possible home-based data collection system:

  • Database for some config information as well as data measurement storage and retrieval
  • Web service for receiving all measurement updates
  • Windows service for collecting all analog measurement data from the LabJack ADC(s)
  • Scripts running on each computer be it the Mac OS, Windows XP, or Windows 7 for performance and capacity measurements.
  • Windows service to collect all data logging for weather, power, power and some event data.

Here are some of the pros of both a  centralized system and a distributed system:

Centralized System Pros:

  1. Simpler maintenance since all measurements are taken from one spot.
  2. Less complexity overall since there are fewer pieces overall.

Distributed System Pros:

  1. Easier to collect data from computers since rights are less of an issue when running locally.
  2. Each piece is far simpler since it’s only responsible for doing a few basic things.
  3. The system can grow as the different types of measurements grow without affecting existing measurements

I’ve continued to think about a centralized system since one giant windows service that gets all of it’s configuration data from the database would be far cooler and ultimately be more useable from install to install if I wanted to use this system elsewhere. This is nice, but as I was writing some prototype code for this service, it quickly got fairly complex and I hadn’t taken a single measurement.

I now believe that a distributed system might be easier for a the reasons listed above. I still plan on building a fairly healthy windows service to take the analog to digital measurements done primarily by the LabJack(s), but now can focus entirely on that process in the windows service. One issue that kept cropping up was how to handle delta-only measurements for state on a schedule. Another issue was how not incur a long timeout wait while the service tried to connect to a computer for WMI measurements every time the service ran since the computers would many times be off.

With a distributed system, the computer security and scheduling issues aren’t as much of a problem since the processes would only take measurements when the  computer was running. I would let the web service determine if the measures needed to be recorded or ignored.

Next Steps

  1. First, I will attempt to prototype a web service built in either PHP or ASP.Net, not sure yet. I’m very familiar with ASP.net web services but would like to try PHP.
  2. Second, I’ll hook the web service up to the MYSQL database I’ve prototyped. As soon as I’ve done this and proved I can record measurements from some simple scripts, I’ll put my code and schema out for anyone that cares to check it out.

Leave your Comment