All Topics

Working Group

#551 Haystack Type System WG

Working Group
Status	Review
Champion
Brian Frank
Members
Matthew Giannini
Eric Skiba
Jay Herron
Carl Neilson
Gabe Fierro
Alex Bible
Stephen Frank
Cory Mosiman
Patrick Coffey
Terry Herr
Gia Nguyen
Dylan Cutler
Christian Tremblay
Dave Robin
Eric Loew
Monica Holbrook
Alper Üzmezler
Jason Briggs
Andy Frank
Chad Ruch
Nick Laws
Winston Hetherington
Doug Migliori
Jonathan Fromm
Steve Eynon
Bernhard Isler
Cliff Copass
Nathan Travis
David Adams
Brian Simmons
Luke Walsh
Nate Benes
Jeremy Yon
Peter Cobb
Justin Rea
Keith Bishoρ
Richard Seaman
Jordan Van Hall
Holly Hofer
Joel Bender
Brandyn Carlson
John MacEnri
Brandon DuPrey
Rob Knight
Paul Stanley
Michael Poplawski
Buddy Patton
Marco Pritoni
Matthew Hollar
Ryan Hoest
Siddharth Goyal
Calvin Slater
Coen Hoogervorst
Jan Široký
Maya Tzabary
Mike Lee
Jonas Bülow
Mike Melillo

Brian Frank Tue 17 Oct 2017

Overview

Haystack is designed around the concept of tagging entities with name/value pairs to describe facts about those entities. The formal definitions of these tags and their value types are captured in a machine readable format (Trio files) which is used to generate the tags section of this website. But how tags are combined lacks formal machine readable definitions. For example the description and constraints of how to model site/equip/point entities is largely described by documentation without a corresponding formal schema and machine readable format. Historically this has been by design since formalization of "compound types" introduces significant complexity. But with broader adaptation of Haystack, there seems to be a pent-up demand to formalize types/schema. We believe its time to tackle this problem, and would like to kick start a new working group.

I have spent several weeks designing various prototypes with help from Matthew Giannini. By way of this post, I will describe a fairly complete prototype which serves as a starting point for a proposal on how types might work in Haystack. The prototype defines most of the Haystack model using a type system I will discuss here. I have made the source code and the documentation it generates available for download (discussed below).

Requirements

Leverage Markers: we wish to leverage Haystack's existing and extensive use of markers as the basis for a more advanced data type system. We do not wish to introduce a new concept such as a "type" tag.

No Indirection: all data semantics should be captured in the entity's tags. You should not be required to have previous knowledge (such as a data dictionary) or make an additional network request to infer semantics. For example if a point currently uses discharge air temp sensor, then that will not be coalesced into some abstract "tag set" name that requires another request to know that all those tags were applied. Or put another way: entities will always continue to expand their full set of tags inline.

Tooling: a common use case for a more advanced type system is to allow tool manufactures to develop UIs that "guide" users to properly tag their data. Capturing tag relationships and rules in machine format is a key requirement for tooling

Validation: a machine readable schema allows validation of data models. But we acknowledge that type systems require a trade-off; more complex type systems are required to more fully validate data. And no declarative type system can perform 100% validation. We wish to strike a compromise with a practical type system that performs basic validation, but will not provide perfect validation.

RDF: it is desired that enhancements to Haystack allow our taxonomy to be expressed in alternate formats such as RDFS, RDFa, micro-data, JSON-LD, etc. These technologies are based on the concept of subject-predicate-object triples that map well to Haystack's entity name/value tags. And ideally we want to map Haystack types to the RDF Schema class model.

Source Definitions: the goal of this effort is to rewrite the project-haystack.org specification source material using the new definitions and formats as the authoritative source. The machine readable formats will be directly accessible over HTTP and also used to auto-generation the HTML presented on the site.

Observations

Lets begin with a couple observations of how the existing model works. There are essentially only four "root" entity types: sites, equips, points, and weather stations. All other Haystack tags are used to annotate these four core entity types with additional information.

There are three distinct ways we use tags to annotate the core entity types:

Has Tags: an entity may have specific value based tags. For example a site entity may apply the area tag to define the building's square footage. This sort of tag usage includes all the tags which are neither markers nor refs.
Subtyping Tags: we use marker tags to create subtypes to further refine the semantics of a given entity. For example adding sensor to a point entity narrows the type of point represented
Relationship Tags: we use ref tags to establish relationships between entities. For example adding equipRef on a point defines a equip/point containment relationship

In all three cases, what we really desire is to document the behavior of a specific combination of tags. This has been a pain point maintaining the documentation. For example lets take the water tag. It has a generic definition which means "associated with liquid water". But it also has more specific definitions when paired with point, meter, or tank. In our final solution, we want tags to be defined generically with more specific documentation as we combine tags.

Tag Based Subtyping

We add marker tags to an entity to indicate a more specific type of the entity. For example we add ahu to equip to mark the equipment as an air handler unit. We can further mark the AHU with steamHeat to indicate its an AHU using steam from a central plant for heating. Each time we apply a marker tag we further restrict what the entity type represents. From a type system perspective, this is a form of subtyping.

There are two key observations to be made about how marker tags are used for subtyping:

Subtypes are often defined as an exclusive choice: for example an AHU can have hotWaterHeat, steamHeat, elecHeat, or gasHeat
Subtyping is multi-dimensional: for example I can subtype a AHU by its heating method, cooling method, and ductwork configuration (all simultaneously)

This pattern plays out in the documentation quite often in a non-formal way:

Point qualifier: sensor, cmd, sp
Point subject: air, water, steam, elec, etc
Point quantity: temp, flow, pressure, power, energy, etc
Power Qualifier: active, reactive, apparent
AHU heating: steamHeat, hotWaterHeat, gasHeat, elecHeat
AHU cooling: dxCool, chilledWaterCool
AHU ductwork: singleDuct, dualDuct, tripleDuct
VAV airflow: series, parallel
Chiller type: absorption, reciprocal, screw, centrifugal

Another important consideration is that these exclusive choices are often open ended. This is opposed to an enum in a programming language which is closed (once defined you may not add new choices to the enumeration). But in a data model type system, these enumerated choices may be expanded after the fact. An example might be a project which requires a subtype choice not covered by the standard Haystack tag library.

Type Names

One of the common questions I've heard over the years is this: why not just define a shorthand name for a combination of tags such as "discharge air temp point". But what would this name be? Creating a shortcut such as "DAT" would go against the principle of avoiding indirection to understand an entity's tags. And to provide the same information without indirection would lead to a name such as "DischargeAirTempPoint" which sort of defeats the purpose of creating new names. I would propose that any new synthetic name generated for Haystack's type system is strictly just a combination of existing tag names. For example the type that represents an AHU with steam heating:

equip ahu steamHeat    // tags separated by space
equip-ahu-steamHeat    // tags separated with dash
equip+ahu+steamHeat    // tags separated with plus
steamHeat ahu equip    // most specific to least specific

For this proposal I will use the first option: a type name is a list of tags separated by space and ordered from least to most specific. For the prototype documentation HTML pages I used dash instead of space as a more URL friendly file name.

Side note: I also investigated using camel case to join tags name together (if all tags were lowercase). But we have many tags such as hotWaterHeat where this would cause a problem. These compound tag names are a potential problem which could possibly be solved more elegantly through the type system. But I'll leave that as a discussion for the working group.

Notation

In order to discuss how we might apply a type system to Haystack tags using the concepts above, we need some notation. I'm going to introduce a notation/syntax which I have found concise and readable to develop the prototype. However, my proposal is based on the abstraction concepts, not the specific syntax I am using here. However at some point we will need to formalize one or more machine readable formats which capture the type system abstractions.

Here is the quick summary of notation:

type > tag       // type has tag
type dim>        // type has subtype dimension
type dim> tag    // subtype choice within given dimension
type <ref> type  // relationship definition

Lets look at each of these notations in more detail...

Notation: Has

Lets start off with an entity which might have data tags:

site > area             // Square footage of the site
site > tz               // Timezone of the site
site > primaryFunction  // Primary function of the site
site > yearBuilt        // Original construction year of the site

Here we using the syntax "type > tag" to define that the LHS (left hand side) type may optionally use the RHS (right hand side) tag according the definition given in the slash-slash comment. This definition is context specific to when the tag is applied to the LHS type.

We can use Python style indentation to omit the base type. The following has exactly the same semantics as the definitions above:

site 
  > area             // Square footage of the site
  > tz               // Timezone of the site
  > primaryFunction  // Primary function of the site
  > yearBuilt        // Original construction year

Notation: Subtype

Here is how to define a subtype dimension

point subject>        // Subject or substance of the point's measurement or control
point subject> air    // Point related to air
point subject> water  // Point related to water
point subject> steam  // Point related to steam
point subject> elec   // Point related to electricity

Here we define a named dimension of subtyping on points. In this case the dimension name is subject as defined with the syntax "type dim>". Then we can define exclusive subtype choices for that dimension with the syntax "type dim> tag". Each choice defines a new type. In our example above, we have now defined the new types "point air", "point water", etc.

We can use indentation to collapse the definition above. And lets flush out more point subtypes to see how it works in practice:

point

qualifier>            // Classifies the point as a sensor, command, or setpoint
  sensor              // Point is a sensor, input, AI/BI
  cmd                 // Point is a command, actuator, AO/BO
  sp                  // Point is a setpoint, soft point, internal control variable, schedule

subject>              // Subject or substance of the point's measurement or control
  air                 // Point related to air
  water               // Point related to water
  steam               // Point related to steam
  elec                // Point related to electricity
  refrig              // Point related to refrigerant substance

air

  quantity>           // Quantity of air measured or controlled
    temp              // Point related to dry bulb air temperature
    humidity          // Point related to percent relative humidity of air
    flow              // Point related to volumetric air flow
    pressure          // Point related to static air pressure

water

  quantity>           // Quantity of water measured or controlled
    temp              // Point related to water temperature
    flow              // Point related to volumetric water flow
    pressure          // Point related to water pressure

  waterType>          // Type of the water and its usage
    domestic          // Tap water for drinking, washing, cooking, and flushing of toliets
    hot               // Hot water used for heating or supply to hot taps
    chilled           // Water used for cooling
    condenser         // Water used used to remove heat through condensation
    makeup            // Water used used to makeup water loss through leaks, evaporation, or blowdown
    blowdown          // Water expelled from a system to remove mineral build up

What is created as you define these dimensions and their choices is a "type tree" or "decision tree". Each time you add a marker tag it potentially opens up new choices to narrow the type along multiple branches (dimensions).

Notation: Relationship

Lastly we need a notation to define relationships. Here are some examples:

equip <equipRef> equip      // Equipment contains sub-equipment
equip <equipRef> point      // Equipment contains point

A relationship has a LHS type and a RHS type and one or more relationship tags grouped between the "<>". The first relationship tag must be a ref tag which is applied to the entity on the RHS to reference the LHS. Or put another way the RHS is the "from entity" and the LHS is the "to entity" in terms of the ref tag. Lets deconstruct this example:

LHS    Tags        RHS      Doc definition of relationship
-----  ---------   -----    -------------------------------
equip  <equipRef>  point    // Equipment contains point

The LHS type is any entity tagged with the equip marker tag. The RHS is a point entity. In order to apply the relationship, then the equipRef tag must applied to the RHS (the point) and reference the LHS (the equip). When all of those conditions hold true, then the relationship applies.

We can define additional tags to apply to the RHS entity for more complex relationships:

equip ahu <equipRef discharge> point air  // AHU point associated with discharge air duct
equip ahu <equipRef return> point air     // AHU point associated with return air duct

In the example above, the LHS (to) is AHU equipment and the RHS (from) is points associated with the measurement/control of air. The relationship tags include both a ref tag as well as a "section tag" to apply to the point to create the specified relationship. This model allows us to reuse the subtype definition of "point air" without duplicating massive point tag combinations under each equipment (like we do today).

Here are some more relationship examples for a steam plant:

equip plant steam

  <steamPlantRef> equip ahu steamHeat   // Plant supplies steam to AHU for heating 
  <equipRef> equip boiler               // Plant contains boiler
  <equipRef leaving> point steam        // Point associated with steam leaving plant as heating supply
  <equipRef delta> point steam          // Point associated with steam differential between leaving and entering
  <equipRef entering> point steam       // Point associated with steam returning to plant to be heated back up

Prototype

I have developed a complete prototype for the type system discussed above. This is actually my third prototype (the first two being dead ends). The prototype is developed in Fantom and has following key features:

TagDef: models a single tag definition
TypeDef: models type, its has tags, dimensions, and relationships
Model: immutable data structure for all the TagDef and TypeDef
Loader: loads one or more haydef text files to build an in-memory model
DocGen: generates simple HTML documentation for a model
lib/*.haydef: definitions for about 70% of the Haystack model using notation discussed

You can download the prototype include source code, definitions, and example documentation from:

https://project-haystack.org/download/build/haystack-model-prototype-2017-10-17.zip

To run the documentation use this command which generates HTML files to "./doc/"

bin/fan haystackModel::DocGen

The prototype has quite a bit of the model flushed out including

air, water, steam points
electrical meters and power/energy/volt/current points
central plants (using simple, not existing compound tags)
chillers
boilers
VAVs

None of it is complete, but its pretty far along to test out the concepts. If you are interested in this topic, then I would encourage you to download it and at least look thru the haydef text files.

Next Steps

There seems to lots of momentum with various organizations, vendors, and community members around this core problem. I believe now is a great time to tackle the problem head on. So I'd like to create a new working group (WG) for those interested. I'm thinking of a WG process with weekly webcast calls. Also feel free to post ideas/comments to the forum. If you are interested please use the "Join Group" command to join the WG.

Stephen Frank Wed 18 Oct 2017

Count me in on the WG please.

Some things I would also like this WG to address are:

General typing for location information (currently missing from Haystack)
Revisit best practice for handling one-to-many and many-to-many relationships

Jason Briggs Wed 18 Oct 2017

I already met with Brian offline, and love the direction of this. Been needing this for a long time.

Greg Ingram Sun 19 Nov 2017

What's the status on this topic? Next steps? Any updates/notes from WG meetings?

Doug Migliori Tue 9 Jan 2018

I would suggest that all contributors to this WG read parts 3 and 4 of the multi-part article series on Cross-Industry Semantic Interop to broaden perspectives.

http://www.embedded-computing.com/semantic-interop/cross-industry-semantic-interoperability-part-three-the-role-of-a-top-level-ontology

Richard Seaman Mon 26 Nov 2018

I’ll begin by putting my hand up and admitting that I’m late to the party. I’ve done my best to read through previous forums and catch up. Apologies if I’m raising something which has already been covered in offline discussions / meetings.

The proposed tag based method of subtyping simultaneously across multiple dimensions works incredibly well. We are in the process of applying this to a range of projects. However, we have come across a number of issues which have forced us to deviate slightly from the proposal in order to accommodate. This is probably because the projects in question are industrial rather than commercial.

One such project, is a seed processing factory, which takes in seeds and produces vegetable oils through a number of processes. There are thousands of points which monitor the materials throughout each process. Many of these points can’t be classified as either an AirPoint, ElecPoint, WaterPoint, etc.

The proposed solution above is to expand the enumerated choices:

But in a data model type system, these enumerated choices may be expanded after the fact. An example might be a project which requires a subtype choice not covered by the standard Haystack tag library.

While this solution works, the implications may not have been fully worked through. We fear it may lead to a huge amount of duplication and maintenance.

We’ll highlight this using a real world example from the above project. One of the processes involved is to extract oil out of the raw seeds using a combination of heat and chemicals. This extracted oil passes through a heat exchanger (so that some heat may be recovered for use elsewhere) before continuing on to the next process.

In order to include points which monitor extracted oil, a new extractedOil subtype is required within the point’s subject dimension. The available quantities would also have to be updated.

Changes to the haydef file would be as follows (apologies if notation is wrong):

point subject> extractedOil
point extractedOil quantity> temp
point extractedOil quantity> flow
point extractedOil quantity> pressure

In order to allow this fluid to be used within a heat exchanger equip, the changes to its haydef file used would be as follows:

equip heatExchanger <equipRef temp entering> point extractedOil
equip heatExchanger <equipRef temp leaving> point extractedOil
equip heatExchanger <equipRef flow> point extractedOil

Of course any other equip involved would also have to have their haydef files updated too. And this is only for one additional subject. Unfortunately in this case (and potentially many cases in the industrial world) there will be multiple additional subjects. Hence the concern over duplication and maintenance.

An observation worth highlighting, is that although the extractedOil is a new subject, it shares its available quantities with water. This is to be expected as they are both liquids. It is here where we made our first deviation, by introducing the State.

We have assigned a state to each subject (or none in the case of elec). Instead of explicitly expecting a particular subject in a relationship, we expect a subject with a state of either solid, liquid or gas (or optionally we can still use a specific subject). This addresses the need of having to maintain the equip haydef files when new subjects are added.

In order to address the quantity issue, we’ve actually changed the type tree somewhat. Instead of firstly selecting the subject, the quantity is the first selection made. In other words, the quantity becomes a subtype of point, and subject becomes a subtype of quantity. For each quantity, the available subject states (and also sections) are defined.

This came after the realisation that the quantity ultimately drives everything. If for example a temperature quantity is selected, it makes sense that any subject (solid / liquid / gas) could be used. However, if a quantity of pressure is selected, it only makes sense that liquid or gas subjects are used. Similarly, if an electric current quantity is selected, it should have a subject with no state (i.e. elec). The selectable sections can also be derived in a similar manner. This still provides basic validation, albeit there may be cases where different subjects with the same states don’t necessarily have the same suitable qualities (i.e. compromise on perfect validation, as mentioned above).

What’s nice about this approach, is that the available quantities are much less expandable. One could argue that there is an exhaustive known list of them. In fact our initial exhaustive list are those from SkySpark. Importantly, once defined, there’s no need to alter them. A quantity is always the same whether it’s residential, commercial or industrial (it’s just unlikely you’ll need a magneticFluxDensity within a residential project!).

The above approach is simplified somewhat but we have applied it to a number of examples and are happy with how it plays out (qualifiers and other topics are purposely omitted as this post is long enough already!). However, we’re eager to not diverge from the agreed standard if at all possible.

Would love to hear how these kinds of issues are being addressed by others.

Brian Frank Mon 26 Nov 2018

Hi Richard,

That is good insight into the modeling of fluid quantities. I have actually experimented with making a FluidPoint that is a supertype of WaterPoint. This would be a good use case to further that idea. But I do still believe that the point's subject is the primary typing mechanism - for example the measurements of ElecPoint are vastly different than the measurements of FluidPoint. I see you are signed up for WG 551, so that best way to get involved. We'll probably have next call in mid-December, and this would be good topic to cover.

Richard Seaman Wed 28 Nov 2018

Hi Brian,

I haven’t yet been involved in any working groups but would be eager to do so. You mentioned a call in mid-December, will there be an email distributed ahead of time with details? It would be great to discuss and explore these two primary typing mechanisms (subject vs quantity) in greater detail. I’ll give a brief initial response below.

From our experience, the first thing a user does when instantiating a point, is to select the thing that is being measured (quantity). They can then further qualify it by adding the subject and section. If they selected a quantity of current, there’s no need to select a subject of elec as this is the only possible subject for it. In this case the {current} tag would be added. If on the other hand, they first had to select a subject of elec and then the quantity of current, the resulting tags would be {elec, current}. Is this additional elec tag really necessary in this case? (ditto for volt, pf, freq). However, if a temperature is the quantity selected, additional subject tags would certainly help to further qualify what the point represents (e.g. {temp, air}, {temp, water}, {temp, extractedOil}).

Supertypes do seem like a good solution if using the subject as the primary typing mechanism. The example I provided could likely be solved with a FluidPoint supertype. There would also be a need for a SolidPoint supertype for the same reasons. However, I imagine the majority (if not all) of the quantities would reside in the supertype. For example, if I created an extractedOil subject which was a subtype of FluidPoint, I would have no need to extend the available quantities.

It sounds like two ways of doing the same thing. The difference will be that the resulting flat list of tags may differ (which is my main concern).

Brian Frank Thu 28 Mar 2019

We are transitioning this working group to public review - see 687

Project Haystack