Working Group | |
Status | Review |
Champion | |
Brian Frank | |
Members | |
Matthew Giannini | |
Eric Skiba | |
Jay Herron | |
Carl Neilson | |
Gabe Fierro | |
Alex Bible | |
Stephen Frank | |
Cory Mosiman | |
Patrick Coffey | |
Terry Herr | |
Gia Nguyen | |
Dylan Cutler | |
Christian Tremblay | |
Dave Robin | |
Eric Loew | |
Monica Holbrook | |
Alper Üzmezler | |
Jason Briggs | |
Andy Frank | |
Chad Ruch | |
Nick Laws | |
Winston Hetherington | |
Doug Migliori | |
Jonathan Fromm | |
Steve Eynon | |
Bernhard Isler | |
Cliff Copass | |
Nathan Travis | |
David Adams | |
Brian Simmons | |
Luke Walsh | |
Nate Benes | |
Jeremy Yon | |
Peter Cobb | |
Justin Rea | |
Keith Bishoρ | |
Richard Seaman | |
Jordan Van Hall | |
Holly Hofer | |
Joel Bender | |
Brandyn Carlson | |
John MacEnri | |
Brandon DuPrey | |
Rob Knight | |
Paul Stanley | |
Michael Poplawski | |
Buddy Patton | |
Marco Pritoni | |
Matthew Hollar | |
Ryan Hoest | |
Siddharth Goyal | |
Calvin Slater | |
Coen Hoogervorst | |
Jan Široký | |
Maya Tzabary | |
Mike Lee | |
Jonas Bülow | |
Mike Melillo |
Brian Frank Tue 17 Oct 2017
Overview
Haystack is designed around the concept of tagging entities with name/value pairs to describe facts about those entities. The formal definitions of these tags and their value types are captured in a machine readable format (Trio files) which is used to generate the tags section of this website. But how tags are combined lacks formal machine readable definitions. For example the description and constraints of how to model site/equip/point entities is largely described by documentation without a corresponding formal schema and machine readable format. Historically this has been by design since formalization of "compound types" introduces significant complexity. But with broader adaptation of Haystack, there seems to be a pent-up demand to formalize types/schema. We believe its time to tackle this problem, and would like to kick start a new working group.
I have spent several weeks designing various prototypes with help from Matthew Giannini. By way of this post, I will describe a fairly complete prototype which serves as a starting point for a proposal on how types might work in Haystack. The prototype defines most of the Haystack model using a type system I will discuss here. I have made the source code and the documentation it generates available for download (discussed below).
Requirements
Leverage Markers: we wish to leverage Haystack's existing and extensive use of markers as the basis for a more advanced data type system. We do not wish to introduce a new concept such as a "type" tag.
No Indirection: all data semantics should be captured in the entity's tags. You should not be required to have previous knowledge (such as a data dictionary) or make an additional network request to infer semantics. For example if a point currently uses
discharge air temp sensor
, then that will not be coalesced into some abstract "tag set" name that requires another request to know that all those tags were applied. Or put another way: entities will always continue to expand their full set of tags inline.Tooling: a common use case for a more advanced type system is to allow tool manufactures to develop UIs that "guide" users to properly tag their data. Capturing tag relationships and rules in machine format is a key requirement for tooling
Validation: a machine readable schema allows validation of data models. But we acknowledge that type systems require a trade-off; more complex type systems are required to more fully validate data. And no declarative type system can perform 100% validation. We wish to strike a compromise with a practical type system that performs basic validation, but will not provide perfect validation.
RDF: it is desired that enhancements to Haystack allow our taxonomy to be expressed in alternate formats such as RDFS, RDFa, micro-data, JSON-LD, etc. These technologies are based on the concept of subject-predicate-object triples that map well to Haystack's entity name/value tags. And ideally we want to map Haystack types to the RDF Schema class model.
Source Definitions: the goal of this effort is to rewrite the project-haystack.org specification source material using the new definitions and formats as the authoritative source. The machine readable formats will be directly accessible over HTTP and also used to auto-generation the HTML presented on the site.
Observations
Lets begin with a couple observations of how the existing model works. There are essentially only four "root" entity types: sites, equips, points, and weather stations. All other Haystack tags are used to annotate these four core entity types with additional information.
There are three distinct ways we use tags to annotate the core entity types:
site
entity may apply thearea
tag to define the building's square footage. This sort of tag usage includes all the tags which are neither markers nor refs.sensor
to apoint
entity narrows the type of point representedIn all three cases, what we really desire is to document the behavior of a specific combination of tags. This has been a pain point maintaining the documentation. For example lets take the water tag. It has a generic definition which means "associated with liquid water". But it also has more specific definitions when paired with point, meter, or tank. In our final solution, we want tags to be defined generically with more specific documentation as we combine tags.
Tag Based Subtyping
We add marker tags to an entity to indicate a more specific type of the entity. For example we add
ahu
toequip
to mark the equipment as an air handler unit. We can further mark the AHU withsteamHeat
to indicate its an AHU using steam from a central plant for heating. Each time we apply a marker tag we further restrict what the entity type represents. From a type system perspective, this is a form of subtyping.There are two key observations to be made about how marker tags are used for subtyping:
This pattern plays out in the documentation quite often in a non-formal way:
Another important consideration is that these exclusive choices are often open ended. This is opposed to an enum in a programming language which is closed (once defined you may not add new choices to the enumeration). But in a data model type system, these enumerated choices may be expanded after the fact. An example might be a project which requires a subtype choice not covered by the standard Haystack tag library.
Type Names
One of the common questions I've heard over the years is this: why not just define a shorthand name for a combination of tags such as "discharge air temp point". But what would this name be? Creating a shortcut such as "DAT" would go against the principle of avoiding indirection to understand an entity's tags. And to provide the same information without indirection would lead to a name such as "DischargeAirTempPoint" which sort of defeats the purpose of creating new names. I would propose that any new synthetic name generated for Haystack's type system is strictly just a combination of existing tag names. For example the type that represents an AHU with steam heating:
For this proposal I will use the first option: a type name is a list of tags separated by space and ordered from least to most specific. For the prototype documentation HTML pages I used dash instead of space as a more URL friendly file name.
Side note: I also investigated using camel case to join tags name together (if all tags were lowercase). But we have many tags such as hotWaterHeat where this would cause a problem. These compound tag names are a potential problem which could possibly be solved more elegantly through the type system. But I'll leave that as a discussion for the working group.
Notation
In order to discuss how we might apply a type system to Haystack tags using the concepts above, we need some notation. I'm going to introduce a notation/syntax which I have found concise and readable to develop the prototype. However, my proposal is based on the abstraction concepts, not the specific syntax I am using here. However at some point we will need to formalize one or more machine readable formats which capture the type system abstractions.
Here is the quick summary of notation:
Lets look at each of these notations in more detail...
Notation: Has
Lets start off with an entity which might have data tags:
Here we using the syntax "type > tag" to define that the LHS (left hand side) type may optionally use the RHS (right hand side) tag according the definition given in the slash-slash comment. This definition is context specific to when the tag is applied to the LHS type.
We can use Python style indentation to omit the base type. The following has exactly the same semantics as the definitions above:
Notation: Subtype
Here is how to define a subtype dimension
Here we define a named dimension of subtyping on points. In this case the dimension name is
subject
as defined with the syntax "type dim>". Then we can define exclusive subtype choices for that dimension with the syntax "type dim> tag". Each choice defines a new type. In our example above, we have now defined the new types "point air", "point water", etc.We can use indentation to collapse the definition above. And lets flush out more point subtypes to see how it works in practice:
What is created as you define these dimensions and their choices is a "type tree" or "decision tree". Each time you add a marker tag it potentially opens up new choices to narrow the type along multiple branches (dimensions).
Notation: Relationship
Lastly we need a notation to define relationships. Here are some examples:
A relationship has a LHS type and a RHS type and one or more relationship tags grouped between the "<>". The first relationship tag must be a ref tag which is applied to the entity on the RHS to reference the LHS. Or put another way the RHS is the "from entity" and the LHS is the "to entity" in terms of the ref tag. Lets deconstruct this example:
The LHS type is any entity tagged with the
equip
marker tag. The RHS is a point entity. In order to apply the relationship, then theequipRef
tag must applied to the RHS (the point) and reference the LHS (the equip). When all of those conditions hold true, then the relationship applies.We can define additional tags to apply to the RHS entity for more complex relationships:
In the example above, the LHS (to) is AHU equipment and the RHS (from) is points associated with the measurement/control of air. The relationship tags include both a ref tag as well as a "section tag" to apply to the point to create the specified relationship. This model allows us to reuse the subtype definition of "point air" without duplicating massive point tag combinations under each equipment (like we do today).
Here are some more relationship examples for a steam plant:
Prototype
I have developed a complete prototype for the type system discussed above. This is actually my third prototype (the first two being dead ends). The prototype is developed in Fantom and has following key features:
You can download the prototype include source code, definitions, and example documentation from:
https://project-haystack.org/download/build/haystack-model-prototype-2017-10-17.zip
To run the documentation use this command which generates HTML files to "./doc/"
The prototype has quite a bit of the model flushed out including
None of it is complete, but its pretty far along to test out the concepts. If you are interested in this topic, then I would encourage you to download it and at least look thru the haydef text files.
Next Steps
There seems to lots of momentum with various organizations, vendors, and community members around this core problem. I believe now is a great time to tackle the problem head on. So I'd like to create a new working group (WG) for those interested. I'm thinking of a WG process with weekly webcast calls. Also feel free to post ideas/comments to the forum. If you are interested please use the "Join Group" command to join the WG.
Stephen Frank Wed 18 Oct 2017
Count me in on the WG please.
Some things I would also like this WG to address are:
Jason Briggs Wed 18 Oct 2017
I already met with Brian offline, and love the direction of this. Been needing this for a long time.
Greg Ingram Sun 19 Nov 2017
What's the status on this topic? Next steps? Any updates/notes from WG meetings?
Doug Migliori Tue 9 Jan 2018
I would suggest that all contributors to this WG read parts 3 and 4 of the multi-part article series on Cross-Industry Semantic Interop to broaden perspectives.
http://www.embedded-computing.com/semantic-interop/cross-industry-semantic-interoperability-part-three-the-role-of-a-top-level-ontology
Richard Seaman Mon 26 Nov 2018
I’ll begin by putting my hand up and admitting that I’m late to the party. I’ve done my best to read through previous forums and catch up. Apologies if I’m raising something which has already been covered in offline discussions / meetings.
The proposed tag based method of subtyping simultaneously across multiple dimensions works incredibly well. We are in the process of applying this to a range of projects. However, we have come across a number of issues which have forced us to deviate slightly from the proposal in order to accommodate. This is probably because the projects in question are industrial rather than commercial.
One such project, is a seed processing factory, which takes in seeds and produces vegetable oils through a number of processes. There are thousands of points which monitor the materials throughout each process. Many of these points can’t be classified as either an AirPoint, ElecPoint, WaterPoint, etc.
The proposed solution above is to expand the enumerated choices:
While this solution works, the implications may not have been fully worked through. We fear it may lead to a huge amount of duplication and maintenance.
We’ll highlight this using a real world example from the above project. One of the processes involved is to extract oil out of the raw seeds using a combination of heat and chemicals. This extracted oil passes through a heat exchanger (so that some heat may be recovered for use elsewhere) before continuing on to the next process.
In order to include points which monitor extracted oil, a new extractedOil subtype is required within the point’s subject dimension. The available quantities would also have to be updated.
Changes to the haydef file would be as follows (apologies if notation is wrong):
In order to allow this fluid to be used within a heat exchanger equip, the changes to its haydef file used would be as follows:
Of course any other equip involved would also have to have their haydef files updated too. And this is only for one additional subject. Unfortunately in this case (and potentially many cases in the industrial world) there will be multiple additional subjects. Hence the concern over duplication and maintenance.
An observation worth highlighting, is that although the extractedOil is a new subject, it shares its available quantities with water. This is to be expected as they are both liquids. It is here where we made our first deviation, by introducing the State.
We have assigned a state to each subject (or none in the case of elec). Instead of explicitly expecting a particular subject in a relationship, we expect a subject with a state of either solid, liquid or gas (or optionally we can still use a specific subject). This addresses the need of having to maintain the equip haydef files when new subjects are added.
In order to address the quantity issue, we’ve actually changed the type tree somewhat. Instead of firstly selecting the subject, the quantity is the first selection made. In other words, the quantity becomes a subtype of point, and subject becomes a subtype of quantity. For each quantity, the available subject states (and also sections) are defined.
This came after the realisation that the quantity ultimately drives everything. If for example a temperature quantity is selected, it makes sense that any subject (solid / liquid / gas) could be used. However, if a quantity of pressure is selected, it only makes sense that liquid or gas subjects are used. Similarly, if an electric current quantity is selected, it should have a subject with no state (i.e. elec). The selectable sections can also be derived in a similar manner. This still provides basic validation, albeit there may be cases where different subjects with the same states don’t necessarily have the same suitable qualities (i.e. compromise on perfect validation, as mentioned above).
What’s nice about this approach, is that the available quantities are much less expandable. One could argue that there is an exhaustive known list of them. In fact our initial exhaustive list are those from SkySpark. Importantly, once defined, there’s no need to alter them. A quantity is always the same whether it’s residential, commercial or industrial (it’s just unlikely you’ll need a magneticFluxDensity within a residential project!).
The above approach is simplified somewhat but we have applied it to a number of examples and are happy with how it plays out (qualifiers and other topics are purposely omitted as this post is long enough already!). However, we’re eager to not diverge from the agreed standard if at all possible.
Would love to hear how these kinds of issues are being addressed by others.
Brian Frank Mon 26 Nov 2018
Hi Richard,
That is good insight into the modeling of fluid quantities. I have actually experimented with making a FluidPoint that is a supertype of WaterPoint. This would be a good use case to further that idea. But I do still believe that the point's subject is the primary typing mechanism - for example the measurements of ElecPoint are vastly different than the measurements of FluidPoint. I see you are signed up for WG 551, so that best way to get involved. We'll probably have next call in mid-December, and this would be good topic to cover.
Richard Seaman Wed 28 Nov 2018
Hi Brian,
I haven’t yet been involved in any working groups but would be eager to do so. You mentioned a call in mid-December, will there be an email distributed ahead of time with details? It would be great to discuss and explore these two primary typing mechanisms (subject vs quantity) in greater detail. I’ll give a brief initial response below.
From our experience, the first thing a user does when instantiating a point, is to select the thing that is being measured (quantity). They can then further qualify it by adding the subject and section. If they selected a quantity of current, there’s no need to select a subject of elec as this is the only possible subject for it. In this case the {current} tag would be added. If on the other hand, they first had to select a subject of elec and then the quantity of current, the resulting tags would be {elec, current}. Is this additional elec tag really necessary in this case? (ditto for volt, pf, freq). However, if a temperature is the quantity selected, additional subject tags would certainly help to further qualify what the point represents (e.g. {temp, air}, {temp, water}, {temp, extractedOil}).
Supertypes do seem like a good solution if using the subject as the primary typing mechanism. The example I provided could likely be solved with a FluidPoint supertype. There would also be a need for a SolidPoint supertype for the same reasons. However, I imagine the majority (if not all) of the quantities would reside in the supertype. For example, if I created an extractedOil subject which was a subtype of FluidPoint, I would have no need to extend the available quantities.
It sounds like two ways of doing the same thing. The difference will be that the resulting flat list of tags may differ (which is my main concern).
Brian Frank Thu 28 Mar 2019
We are transitioning this working group to public review - see 687