Skip to Content.
Sympa Menu

metadata-support - Re: [Metadata-Support] Extending Metadata Query Protocol

Subject: InCommon metadata support

List archive

Re: [Metadata-Support] Extending Metadata Query Protocol


Chronological Thread 
  • From: Jaime Perez Crespo <>
  • To: "" <>
  • Subject: Re: [Metadata-Support] Extending Metadata Query Protocol
  • Date: Tue, 24 Mar 2015 10:19:03 +0000
  • Accept-language: en-US, nb-NO

Hi Ian,

> On 23 Mar 2015, at 16:23 pm, Ian Young
> <>
> wrote:
>> One of the issues that we’ve observed is that using the Metadata Query
>> Protocol to fetch metadata for previously unknown entities introduces a
>> (potentially big) delay when serving the request that originated the query
>> to the MDX server.
>
> I'm not sure where a potentially big delay would come in (or what you mean
> by "big", exactly). Obviously it depends a lot on the implementation, but
> the most likely thing to extend the query time would be the signature step
> and those just don't take that long these days (or we would complain more
> often about SAML, which requires the same).

That’s in a way out of what I’m worried about. I mean, obviously such a big
delay shouldn’t happen (I’ll take the change to reply to your later mail here
too) and I concur with you that the transatlantic hop shouldn’t be the reason
for that. So there must be something else, though I haven’t seen any specific
pattern (haven’t tested enough for that, I guess). In any case, my point is
that even when such a big delay fetching metadata shouldn’t happen, life is
usually not as perfect all the time :-)

>> * To be able to query the MDX server for a list of all the entities served.
>
> I don't see why having a list of the entities helps you in any way. Either
> you then go on to pulling all of the entities (in which case, as Scott
> points out, you might as well have just pulled an aggregate) or you don't
> (in which case you still have the same delay when you query for each one).

I believe I already replied to this before. If you have a list of entities
served by the MDX server, you can then iterate over it asking for updated
versions of the metadata using the caching mechanisms already available in
HTTP. In other words, you could offload the handling of client-side caching
to the server.

>> When I say “list of entities”, I mean a list of identifiers used by the
>> MDX implementation that can be used to request the metadata of a
>> particular entity (i.e. the entityID or its SHA-1).
>
> As Tom says, we're currently adding something like this as an experimental
> extension to my *implementation* of the specification. What we're
> experimenting with would be more detailed than just a list of the entities:
> more like a discovery feed, in fact. In the longer term, that's something
> that would be more likely dealt with using content negotiation than a
> protocol evolution.

That’s good news, and I could pretty much use that for my purposes.

>> * To be able to query the MDX server for a list of all the entities
>> *modified since* a specific date. This would allow us to query the server
>> later only for those entities that have been modified since the last
>> request.
>
> I don't think that's likely to be a workable addition to the protocol. One
> of the things I've tried to preserve is the possibility of an entirely
> static implementation of MDQ, and what you're describing here would require
> a dynamic web service.
>
> Anything which requires real-time (relative to a query) dynamic assembly of
> an aggregate would fall into the same category. Or put another way, if
> samlbits can't cache it, it's problematic.

Well, that’s a huge handicap then, and I can certainly see the reasons why
you would like an static implementation of MDQ. That makes me more eager then
to have something like the list of entities we were talking about, as you
could still have the “If-modified-since” capabilities while keeping the MDQ
implementation static and lightweight.

>> I understand the first one could be easily disregarded by using the MDX as
>> a standard metadata feed, that is, fetching the whole metadata set it
>> serves, processing, caching, and then proceeding onwards by leveraging the
>> second one. However, I see benefits on being able to iteratively retrieve
>> entities instead of a huge feed, like better performance and availability
>> of entities. In any case, both features would be interesting to make the
>> Metadata Query Protocol even more useful for big deployments, I think
>
> One option you might like to consider would be cacheing individual results
> you get from MDQ and then re-querying for individual entities before their
> cacheDuration / validUntil have expired, using conditional GET based on the
> ETag. If you do this on a background thread for entities which have been
> used since last fetched, you can hide the query latency for all entities
> that are in frequent use (or even occasional use, depending on
> cacheDuration).

I guess with the outline of what I’d like to achieve that I gave in a
previous mail, it’ll be clear that this doesn’t fit at all. First, because I
don’t want to rely on metadata elements that service providers are not using
correctly, if they are using them at all. Secondly, because even if they were
using them, sometimes they update their metadata *before* that deadline set
by cacheDuration or validUntil, and I want to be able to refresh metadata
much faster than just waiting for the metadata to expire. If I were to use
this, then it would be completely unsuitable for what I’d like to do.

> [Full disclosure: my implementation doesn't do conditional GET correctly
> yet, but it's on the list:
> https://github.com/iay/mdq-server/issues/7
> ]

Great! Maybe at some point I can even offer help :-)

> So, for example, if you fetch an entity's metadata and it has 6 hours to
> live, you might hourly check for (a) that entity having been used since the
> last fetch and (b) less than 3 hours before cacheDuration expiry.
>
> Of course you don't have to do anything quite that complex, as
> cacheDuration expiry does not mean that the metadata is invalid. So another
> alternative would be to re-query once cacheDuration has expired, but do
> that on a background thread unless validUntil has also expired and make use
> of the previously fetched results meanwhile. You'd still hide the latency
> most of the time in this case.

Actually I was thinking of something at least equivalent to what we have
today in Feide, and that’s every 5 minutes… :-)

Thanks for your comments!

--
Jaime Pérez
UNINETT / Feide
mail:

xmpp:


"Two roads diverged in a wood, and I, I took the one less traveled by, and
that has made all the difference."
- Robert Frost

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail




Archive powered by MHonArc 2.6.16.

Top of Page