Since this project will collect data from many different sources such as data acquisition boards, pc computers, network systems, and weather station data, I’ve debated on building a centralized collection system or a distributed system. After prototyping a collection database and a windows service that would capture all of the data, I thought it may be better to build a distributed system instead.

What would a distributed collection system entail?

Here are some pieces of a distributed for a possible home-based data collection system:

  • Database for some config information as well as data measurement storage and retrieval
  • Web service for receiving all measurement updates
  • Windows service for collecting all analog measurement data from the LabJack ADC(s)
  • Scripts running on each computer be it the Mac OS, Windows XP, or Windows 7 for performance and capacity measurements.
  • Windows service to collect all data logging for weather, power, power and some event data.

Here are some of the pros of both a  centralized system and a distributed system:

Centralized System Pros:

  1. Simpler maintenance since all measurements are taken from one spot.
  2. Less complexity overall since there are fewer pieces overall.

Distributed System Pros:

  1. Easier to collect data from computers since rights are less of an issue when running locally.
  2. Each piece is far simpler since it’s only responsible for doing a few basic things.
  3. The system can grow as the different types of measurements grow without affecting existing measurements

I’ve continued to think about a centralized system since one giant windows service that gets all of it’s configuration data from the database would be far cooler and ultimately be more useable from install to install if I wanted to use this system elsewhere. This is nice, but as I was writing some prototype code for this service, it quickly got fairly complex and I hadn’t taken a single measurement.

I now believe that a distributed system might be easier for a the reasons listed above. I still plan on building a fairly healthy windows service to take the analog to digital measurements done primarily by the LabJack(s), but now can focus entirely on that process in the windows service. One issue that kept cropping up was how to handle delta-only measurements for state on a schedule. Another issue was how not incur a long timeout wait while the service tried to connect to a computer for WMI measurements every time the service ran since the computers would many times be off.

With a distributed system, the computer security and scheduling issues aren’t as much of a problem since the processes would only take measurements when the  computer was running. I would let the web service determine if the measures needed to be recorded or ignored.

Next Steps

  1. First, I will attempt to prototype a web service built in either PHP or ASP.Net, not sure yet. I’m very familiar with ASP.net web services but would like to try PHP.
  2. Second, I’ll hook the web service up to the MYSQL database I’ve prototyped. As soon as I’ve done this and proved I can record measurements from some simple scripts, I’ll put my code and schema out for anyone that cares to check it out.