2015-12-08

Does GND define authoritative headings?

I already wrote about authority files. In that post I said about Integrated Authority File (GND) which is "operated cooperatively by the German National Library, all German-speaking library networks, the German Union Catalogue of Serials (ZDB) and numerous other institutions" (source):

In the Integrated Authority File a numeric ID (GND ID) is used to identify an authority record. Likewise, each bibliographic record that references this authority record uses the GND ID.

Furthermore, I would even say that this ID is kind of the authoritative "heading" that enables searching for resources about Princess Diana in every data base that uses GND. But a heading in the sense of an authoritative string that all GND users use to refer to Princess Diana does not exist.

On the verge of an edit war ;-)

In December 2013 I had already edited the Wikipedia entry on "Authority Control" to reflect this practice (Edit 1).

Wikipedia user Gymel (Thomas Berger) doesn't agree as you can see by his reverts of my changes (Edit 2, Edit 4 after I put the GND ID back in Edit 3). As this topic can hardly be discussed in Wikipedia commit messages I am writing this post to provide some evidence for my thinking in the hope that the Wikipedia entry will be corrected (once again) soon.

What are "headings", anyway?

Wikipedia says:

In library science, authority control is a process that organizes library catalog and bibliographic information by using a single, distinct name for each topic. The word authority in authority control derives from the idea that the names of people, places, things, and concepts are authorized, i.e., they are established in one particular form. These one-of-a-kind headings are applied consistently throughout the catalog, and work with other organizing data such as linkages and cross references.

In short, headings are authorized names that

  1. are applied consistently throughout the catalog and
  2. are used for linkages and cross references.

With regard to GND (and many other authority files), one would have to adjust point 1: As there is no one catalog GND is maintained for, it should rather read "applied consistently throughout catalogs". Accordingly, below I will have a look at many catalogs from different GND users to see whether there is consensus on one authoritative heading across these different catalogs.

With regard to Geman-speaking cataloging practice, I argue that

  1. no authoritative name strings exist that are applied across catalogs of all GND users, but authoritative strings may only exist within a single catalog
  2. linkages and cross references are exclusively managed by using the GND ID and not a string.

Evidence

Below I am providing some evidence for the two points I made above.
Usage of different headings for presentation purposes

Taking our example Princess Diana (GND ID 118525123), I looked at several data sources to see which heading they use. Interestingly, you just have to look at different representations of the authority record from the German National Library (DNB) alone and will already find different headings in use:

The DNB-OPAC uses Diana, Wales, Prinzessin. We might think that this is the authoritative string. The GND RDF provided by DNB using the GND ontology defines both a "preferred name entity" and a "preferred name". We can already find a difference here – at least in punctuation:


<http://d-nb.info/gnd/118525123>
        gndo:preferredNameEntityForThePerson [
            gndo:epithetGenericNameTitleOrTerritory "Wales, Prinzessin"^^<http://www.w3.org/2001/XMLSchema#string> ;
            gndo:personalName "Diana"^^<http://www.w3.org/2001/XMLSchema#string>
        ] ;
    gndo:preferredNameForThePerson "Diana <Wales, Prinzessin>"^^<http://www.w3.org/2001/XMLSchema#string> .

Taking a look at the DNB-MARC, we can see that birth and death date sneak into the heading (I may be wrong here as I am not very familiar with MARC, really):


        <datafield tag="100" ind1="0" ind2=" ">
            <subfield code="a">Diana</subfield>
            <subfield code="c">Wales, Prinzessin</subfield>
            <subfield code="d">1961-1997</subfield>
        </datafield>

We will also find headings for Princess Diana (GND ID 118525123) that include birth and death dates in data bases of other GND maintainers:

hbz union catalogue: Diana, Wales, Prinzessin, 1961-1997"

GBV union catalogue: Diana <Wales, Prinzessin> *1961-1997*

SWB union catalogue actually has two different headings: Diana, Wales, Prinzessin [1961-1997] and recorded as "Ansetzung Landesarchiv BW": Wales, Diana; Prinzessin; 1961 - 1997 | 118525123

Kalliope (for example this record): Diana <Wales, Princess> (1961-1997)

At last we have two entries that follow the different punctuation versions of the DNB.

HeBIS (e.g. this record): Diana, Wales, Prinzessin

BVB: Diana <Wales, Prinzessin>

Usage of GND IDs for linking

This should be clear to everyone that name strings aren't used for linking to GND entries. As example, I only point to the wide-spread practice of creating beacon files to point to ones catalogued resources on the basis of GND IDs: https://de.wikipedia.org/wiki/Wikipedia:BEACON.

tl;dr

The Integrated Authority File (GND), operated cooperatively by a large group of libraries and library service centers in German-speaking countries, does not define authoritative name strings (= headings) to be used across the different catalogs of GND users.

Kommentare:

Thomas Berger hat gesagt…

Thank you Adrian for writing up the facts I totally agree with: Different serialisations of GND records advertise different forms of "preferred Names", furthermore the MARC exchange format deliberately includes birth and death dates in MARC 100$d which by common practise in D-A-CH-Land are stripped by applications. Consumers in different contexts also have different strategies for adding punctuation (MARC Authority does not provide for recording of "punctuation omitted" in LDR#18 yet I don't think that GND insinuates "Diana Wales, Prinzessin 1961-1997" as a form anyone should use at all) and support or acknowledgement of Non-Sort characters may grossly differ (e.g. VIAF is still struggling with these, the RDF representation of GND records deliberately omits them for the sake of non-library users, ...).

Some of the examples from the different regional union catalogues coincide modulo punctuation so one might speculate whether the "heading" of the authority file is to be understood as some "abstract heading" where everyone is free to introduce punctuation as its suits him.

MARC formats stress the fact that they are agnostic of the concrete cataloging code, but of course one individual record is tied to one specific set of rules (although headings according to different rules may be coded in fields 5XX). Actually, records of the former PND had to accomodate two cataloging codes (RAK-WB and RSWK) which e.g. for medieval persons grossly differed (Hildegardis <Bingensis> vs. Hildegard (von Bingen)). Introduction of the GND brought a unified form of name to that picture.

I do not want to make a pont on the fact of the use of MARC 1XX for something, an authority record without 1XX would probably be rejected by most. And "Diana <Wales, Prinzessin>" as preferredNameForThePerson may be a glitch in the vocabulary or its actual use: Certainly "Diana" is the name of the person and the qualifier is typical for extending the name to a heading or access point.

Now RDA emphasizes that *data* has to be collected and recorded to achieve identificiation without doubt and leaves the construction of "preferred access points" (the former "authorized headings" of AACR2) as an option. The D-A-CH profile mandates the use of that option, i.e. before RDA we did fare with a cataloging code which prescribed the construction rules for the forms of names (and those who did use the PND or later GND could use the data stored there for identification or disambiguisation purposes), now we have one which mandates to add clever extracts from the data collected anyway to construct differentiating headings or access points. So RDA accomodated for the former, quite modern, practise here and the D-A-CH rules introduced the AACR2 legacy...

So one almost tautological answer is: GND records in alignment with the D-A-CH RDA profile *must* contain an authoritative heading since the cataloging code demands it.

Thomas Berger hat gesagt…

(continued from above due to length restrictions)
We could weaken the anglo-american approach to "Authority Control" back to RAK or PND practice, i.e. don't mandate uniqueness of a heading or access point but still ask for preferred forms of the *name*: The Hildegard example and the design of MARC show, that an authority record is able to transport different preferred forms for the same entity to make them usable in different contexts or under different codes. But again: To make a record usable under different codes each of them employing the contexts of preferred headings (and providing rules for their construction) one needs to record that.

If one views upon an authority file as just a means for providing "authority control" for different applications IMHO the whole question becomes moot: One can easily imagine two authority records (for two different entities) of which one is equipped with an authorized heading for the cataloging code X and the other one does not (yet) carry data for X but only for Y: So the records in the authority file are not completely prepared for neither X nor Y. Or one could interpret Tp6 records in the GND as carrying no /authorized/ headings.

IMHO "Authority control" is a concept which cannot be abstracted away from the bibliographic (or otherwise "cataloging") application and just confined to the authority file. So any given catalogue /can/ take headings from a suitable cataloguing code backed by the authority files *and* apply mechanical transformations (stripping dates, adding brackets and/or commas, ...) to achieve its specific form of authority control which may or may not be in accordance with the cataloging code the data is supplied for.

And with respect to the Wikipedia edits: The root of the problem seems to be simply the error in subsuming "Integrated authority file" under "Library catalogue"...

IMHO "Authority control" is a concept which cannot be abstracted away from the bibliographic (or otherwise "cataloging") application and just confined to the authority file. So any given catalogue /can/ take headings from a suitable cataloguing code backed by the authority files *and* apply mechanical transformations (stripping dates, adding brackets and/or commas, ...) to achieve its specific form of authority control which may or may not be in accordance with the cataloging code the data is supplied for.

And with respect to the Wikipedia edits: The root of the problem seems to be simply the error in subsuming "Integrated authority file" under "Library catalogue"...

Adrian Pohl hat gesagt…

I tried to adjust the "Authority Control" entry a bit to reflect that sometimes only the ID is what is actually is used for authority control and not a name string: https://en.wikipedia.org/w/index.php?title=Authority_control&type=revision&diff=694755234&oldid=694365259

I considered Thoimas' pointer that it is problematic to list GND under "Library" (catalog). It's definitely not perfect but hope it helps getting the entry in the right direction...

Kommentar veröffentlichen