All Topics

#370 ZINC format inconsistencies and queries.

Stuart Longland Sun 31 Jan 2016

Hi all,

I figured I'd make a separate thread for this. I've been implementing my own ZINC parser and dumper library and I struck a number of inconsistencies regarding the specification. I'll apologise in advance, this is basically a brain dump of queries that have been piling up for the better part of a fortnight.

The document I used as my base guide was this one: http://project-haystack.org/doc/Zinc

In particular, the "grammar" section was used as the basis for my implementation. I found a library, parsimonius which takes a grammar syntax very close to that of the original ZINC documentation. I had to make some minor changes, but all in all, it was a close fit.

This is what my grammar spec looked like originally: https://github.com/vrtsystems/hszinc/blob/eecf4af4c9aa2d9e4c4a689240aa062c697b43d9/src/hszinc/grammar.py

So to start off with, let's start with the row data. According to the schema:

<row>         :=  <cell> ["," <cell>]* <nl>
<cell>        :=  <scalar>  // empty cell is same as null

In other words, the cells are separated by commas, no whitespace. This is contradicted by the following on the same page.

ver:"2.0" database:"test" dis:"Site Energy Summary"
siteName dis:"Sites", val dis:"Value" unit:"kW"
"Site 1", 356.214kW
"Site 2", 463.028kW

I've added the possibility of whitespace into the grammar, as even some of the nodehaystack test cases use it, but according to the grammar, this shouldn't be allowed. Is there supposed to be whitespace, and under what circumstances?

There's more confusion over URIs. According to the grammar:

<uri>         := "`" <uriChar>* "`"
<uriChar>     := <unicodeChar> | <uriEscChar>
<unicodeChar> := any 16-bit Unicode char >= 0x20 (except str/uri quote)
<strEscChar>  := "\b" | "\f" | "\n" | "\r" | "\r" | "\t" | "\"" | "\\" | "\$" | <uEscChar>
<uriEscChar>  := "\:" | "\/" | "\?" | "\#" | "\[" | "\]" | "\@" | "\\" | "\&" | "\=" | "\;" | <uEscChar>
<uEscChar>    := "\u" <hexDigit> <hexDigit> <hexDigit> <hexDigit>

This seems to include the ` character as one of the possible characters in a URI. How is it escaped?

There's also a list of characters with escape sequences that do not make sense. From that grammar, it would seem the correct ZINC syntax for a URL to this website would be:

`http\:\/\/www.project-haystack.org\/`

which is contradicted by the example URI given at the top of the page. What is meant to be escaped in a URI, and what not escaped?

Then there's the undocumented stuff. As a further test, I grabbed some of the unit tests out of NodeHaystack for testing its ZINC parser/dumper. One had the following grid:

ver:"2.0"
a,    b,      c,      d
T,    F,      N,   -99
2.3,  -5e-10, 2.4e20, 123e-10
"",   "a",   "\" \\ \t \n \r", "\uabcd"
`path`, @12cbb082-0c02ae73, 4s, -2.5min
M,R,Bin(image/png),Bin(image/png)
2009-12-31, 23:59:01, 01:02:03.123, 2009-02-03T04:05:06Z
INF, -INF, "", NaN
C(12,-34),C(0.123,-.789),C(84.5,-77.45),C(-90,180)

Now, 7th row in the "b" column, there's an object called "R". Digging around in nodehaystack I found it was an object called a "[Remove]`https://bitbucket.org/lynxspring/nodehaystack/src/1823d654f63184e32e91de33b696836c20f273b2/HRemove.js?at=master&fileviewer=file-view-default`". I'm guessing it's a singleton like the "Marker" type. I haven't seen where it gets used.

"Binary" objects confuse me also. There's a content-type given, but I don't see where the actual data is given. In the above grid, there's apparently two PNG images. Where does the actual encoded image get put?

I think that'll do as some initial queries, no doubt others will spring to mind and I can add those here. :-)

Regards, Stuart Longland

Brian Frank Sun 31 Jan 2016

Hi Stuart,

Thanks for posting that detailed feedback on Zinc.

In other words, the cells are separated by commas, no whitespace

Space is allowed between tokens. Technically that is just the 0x20 space character since newlines are semantically significant. I guess we could allow tabs too - I don't have any strong opinion on that. I will note this when I update docs for 3.0 data model (nested data structures)

This seems to include the ` character as one of the possible characters in a URI. How is it escaped?

That is an oversight in docs. I will fix - you can escape that using \`

There's also a list of characters with escape sequences that do not make sense.

Those additional escape characters are basically to deal with character that have special meaning in the URI structure. For example lets say you have a file on your filesystem named "file#23". In a URI if you mapped that to file#23, then the pound sign means treat 23 as the fragment identifier. But we want to treat it as part of the name. In encoded URIs you would "%XX" encode that character to treat it specially.

Now, 7th row in the "b" column, there's an object called "R".

Yes that the remove singleton we use SkySpark (which I also put in Java Toolkit). But its not really an official part of Haystack - there are some proposals to add "update" capability to the spec which might require it though.

"Binary" objects confuse me also. There's a content-type given, but I don't see where the actual data is given

Its essentially just a special type for a MIME typed blob, with details left up to implementations. This will go away for 3.0 because it can be handled with XStrs more elegantly.

Project Haystack

#370 ZINC format inconsistencies and queries.

Stuart Longland Sun 31 Jan 2016

Brian Frank Sun 31 Jan 2016