Which came first, XML or HTML

Module 11: Appendix 4: History of HTML and the Web - What is XHTML? - or use XML straight away?

Invention of the Internet: The Internet, i.e. the connection of different and remote computers that are connected via a common protocol, namely TCP / IP, began experimentally as early as the 1970s. Who invented it? The US Department of Defense! In the early history of the Internet, however, this network was by no means worldwide and was only used to exchange data via telnet, ftp or service protocols such as Gopher that are rarely used any more.

Originating from html and the WWW: The breakthrough came in 1990 when Tim Berners-Lee from the European Laboratory for Particle Physics CERN (Geneva) developed the page description language html (hypertext markup language). Version html 1.0 was born. Not only could text documents from remote computers be displayed in a reasonably formatted way on one's own computer, but images could also be integrated and other documents could be requested via a link. Headings and paragraphs, simple formatting of text, list display and graphics integration were possible. Html was linked to the new service protocol http (Hypertext Transfer Protocol) and the WorldWideWeb was born.

However, proprietary developments took place very quickly, such as the invention of frames by Netscape. The next standard, html 2.0, was a long time coming in 1995. The html 2.0 standard introduced forms, among other things. In the meantime, the non-standardized extensions (including JavaScript from Netscape) had gained the upper hand.

(Own web story: By the way, the course instructor began his first web attempts at the end of 1994; the first two books purchased on the subject are given below and are in part the basis for this essay. The first experimental web server I set up was a small Apple LC with the free MacHTTP web server, the forerunner of the Webstar server, which was only online for a few minutes via the telephone line. The Proto-Server was running at the Institute for Geology and Paleontology at the University of Munich. The first web offer from 1995 ran on a Unix-Apache web server of the data center RUS, before a Netscape web server under Unix was set up at the newly networked institute - probably one of the first in Germany. From my early days on the web, the Jurassic Reef Park project (1995, with update 1996), which is still very well attended and can be reached today at http://www.palaeo.de/edu/JRP, still exists.

Further versions were then html 3.0 (end of 1995) and soon afterwards html 3.2. Only now, to the best of my knowledge, have frames been included in the standard. Html 4.0 is supposed to be the last version of html, it was standardized in 1998.

Standardization first begins with the publication of so-called RCFs (Requests for Comments), before the standard is then issued. The responsible body was formerly the IETF (Internet Engineering Task Force), then in 1994 it became the W3C, the WorldWideWeb Consortium. The W3C was founded by CERN together with MIT (Massachusetts Institute of Technology, Cambridge) and then by INRIA (Institut National de Recherche en Informatique et en Automatique, Le Chesnay, France) as well as by various members of industrial companies / etc. Sun Microsystems, Netscape, AT&T, IBM, etc.). However, the three research institutions are in charge. More information and up-to-date information on the W3C can be found at http://www.w3.org.

In the meantime, however, the non-standardized developments had progressed very far. Think of Netscape developments such as frames, JavaScript or the embed and blink tags. The problem with dhtml is even more complex, also created with the introduction of the layer tag from Netscape (see appendix dhtml). The reaction and counter-development of Microsoft's Internet Explorer was not long in coming: iFrames, the differently interpreted JavaScript dialect JScript, Active-X, different definitions of dhtml or tags for the background sound as well as the marks testify to this. Quite a few browsers are now trying to revert to the standards and at least partially disdain these extensions, but this development will probably come too late.

The plugin extensions add even more confusion. Almost everything on the web is feasible with this, Java should actually have taken on this task, but this has not really caught on. The plugin problem: there are different versions of the plugins, sometimes not for all browsers or for all platforms. And with the database connection, which is becoming increasingly important for the on-the-fly generation of websites, for example for virtual department stores, one is also far from accepted standards.

Did you know that there were a lot of independent developments already before the standardization of html3? Some examples of Netscape add-ons at this point:

  • bgcolor, brought color to the backgrounds of previously gray websites for the first time
  • font size: for the first time font sizes could be changed.
  • hr size width align noshade: this allowed the transverse lines to be formatted more individually.
  • nobr: this way, line breaks can be prevented.
  • ul and ol attributes: puts an end to boring bullet points in lists, now circle and square can be used in addition to disc. In addition to numbers, letters and Roman numerals can also be used in ordered lists.

The list of examples could be extended a lot.

Conversely, many of the options offered by html 3 have never really caught on. I have never used them, or hardly used them, and do not even know whether they still work with today's browsers, at least they are not implemented in any of the current web editors. Only a few examples should explain this:

Overlaid images can be created with the tag. Here one figure can be superimposed on the other. In principle, this was the (unnoticed) hour of birth of dhtml (see corresponding attachment).

Footnotes: are defined with a link to an anchor, namely to the footnote. However, no anchor is created but the footnote is defined in the same document: The footnote text is located here.
Tried it out: to the footnote
It all works, but it's a big deal! The footnote is displayed permanently. It would be interesting if it only appeared when you clicked. No wonder it didn't catch on. This works the same way with a normal anchor link.

Math formulas: There are unbelievable possibilities here, even with html, for displaying complex mathematical formulas, but who knows them? Today, a complex formula is usually represented as a gif image. In times of low transfer rates, the html formula display was not unimportant. The following is an example (from the book by R. Tolksdorf, among others):

results in:

Do you still have a browser that displays this correctly? I didn't find any, it should look like this (now as a GIF image)

So nobody will get involved in the uncertainties of generating formulas with html.

It is becoming increasingly clear that standardizations are becoming important again, but that they can no longer be withdrawn to html alone. The development of WML (wireless markup language) as an html adaptation for cell phones (- also requires another protocol, namely wap: wireless application protocol-) shows that html was overwhelmed here.

XML-The future standard? The future standard for the web, too, is to represent XML (cf. course module 9. #) XML was first introduced in 1998 and means Extensible Markup Language. XML alone cannot represent a website at all, but in simple terms only describe the desired basic structure of a page - the implementation must then come through html derivatives, style sheets and scripting languages. Sounds pretty complicated and it is; xml is supposed to become a jack-of-all-trades and also replace database connections, graphics and other file formats and much more.

XML is not tied to a fixed instruction set like html, you can or must define your own tags (elements) and attributes in XML. Normally, an XML file that only describes the structure of a file with the help of these elements and attributes and their nesting or grouping requires internal or external information on how the page is actually to be displayed. For this purpose, a so-called DTD (Document Type Definition) must be created, which regulates this in more detail. Imagine, for example, that a customer sheet defines a surname, first name, address, and order area. Put simply, you can use xml to create tags yourself with any name (which are grouped into a related group, so-called root; the tags are called xml elements). This makes it clear what the structure of the file looks like. If it is to be displayed as html, it must be defined via DTD how each root and sub-root tag is to be formatted (you already know that you can also create any tags for CSS formatting, which are then assigned corresponding definitions) . Another possible solution would be a so-called island solution, in which the corresponding xml tags are then assigned directly in the file to html attributes.

For the same xml file, however, another DTD could be used to specify that data is imported into the root elements, so that it would have become a real database file.

Depending on the DTD used, a program then recognizes whether the xml file is suitable for display or further processing or not. An xml file could be opened at some point by a word processor, a spreadsheet, a database, an image processing program or even by a web browser, although the ending is always xml. Even further: An xml database file could possibly be opened at the same time with an image processing program, with the corresponding imported images or the layout of the database being processed accordingly. Of course, all of this is still a long way off, and the web to an official, compatible xml standard is still a long way off. So far, XML has used a wide variety of markup and programming languages ​​for presentation, e.g. html, scalable vector graphics, wml, css and many more.

Really a dream of the future? There is already a standard that can do similar things, even if it has nothing to do with xml - Apple's Quicktime technology. A file format (with only one extension .mov, and possibly only one MIME type, namely quicktime / video) can understand images, sounds, videos in a wide variety of formats, as well as text and interactive elements (sprites, tweens, flash, etc.) ) and provided with scripts that even enable database connection. The Quicktime.mov files can be opened, processed and saved again by a wide variety of programs. Even if xml insiders will certainly not be satisfied with this comparison (- the different formats are on different tracks with Quicktime and are not defined by respective DTDs -) the principle seems to me to be somewhat similar to that of the xml approach .

A real xml introduction is beyond the scope of this course, especially since xml is not yet a standard. I recommend that those who are professionally involved in website production and multimedia, observe the development and familiarize themselves with the basic principles. As for the web, I expect xml-capable web editors as soon as xml becomes the standard. The current 5th and 6th generation web browsers understand xml to a certain extent, as long as it is linked to the correct DTDs.

For reading I refer to the corresponding chapters in the book by U. Hess & G Karl (see quote below) as well as to Stefan Münzer's corresponding xml explanations in his SelfHTML course)

For those who want to know more right away, at the end of this text (as an appendix to this appendix) I give you an adapted example for the design of XML code from the Hess & Karl book with explanations. Before you inspect this, however, you should read the following explanations on xhtml, especially on the DTDs of XHTML.

XHTML - what is that?

Whether xml will really prevail is still open (but it is well on its way there), but a transitional development is actually already available, this is xhtml (extensible hypertext markup language). I find it a bit difficult to assess the meaning of xhtml, but only version XHTML 1.0 is available, which was presented in spring 2000; future versions should look significantly different. The following seems important to me:

XHTML is almost identical to HTML 4.0, however, it regulates the use and marking of tags much more strictly. So in a way it's a "educational measure "for web producerswhich is supposed to get used to the even stricter rules to be expected regarding XML (see examples below).

These are the essential innovations:

1. Documents must be "well formed".
Doesn't mean anything other than that you have to adhere to the rules of the game of XHTML, which consist of:

  • all tags are to be closed (wasn't always necessary in html):
    so e.g. (in html enough)

  • the so-called empty tags must also be closed
    e.g. (in html :)
    or instead

  • All Tags must be written in lower case
    e.g., instead of

  • All Attribute values ​​must be in quotes, e.g.
    . In html only text strings had to be in quotes as values, numbers and defined expressions (e.g. white for white color) could also be used without quotes.

  • Tags must be nested correctly, e.g.
    . So far there would have been no error message or incorrect display in html, that's over with xhtml.

  • If internal Scripts and style sheets Use the characters &, to avoid xhtml misinterpretations excluded according to the following pattern:
    (all information based on the book by Hess & Karl, which also succeeded in making XML understandable for me)

2. XHTML otherwise consists of plans and suggestions for future versions:
XHTML should, for example, open the way to a modular structure of the website, as it will be similar in XML. For example, you define the body, head, meta and title tags as a structure module, there is also a basic text module with tags such as br, div, h1-6, p etc., and a hypertext module with the a and area tags , further list, form and other modules. DTDs can then be used to define how these tags are to be displayed. More detailed implementation regulations are still missing and the module option should (according to the book by U. Hess & G. Karl, see below) only be integrated from version 1.1. I understand this to mean that the respective modules can then also get global module settings, e.g. through a style sheet. So far, this can also be solved via style sheets (by formatting the corresponding elements via CSS), but will then have more options.

XHTML is already understood by modern browsers, so it can be used as soon as you get used to it. However, it is important that the correct version is specified in the header. I strongly recommend the transitional version so that there are no display errors. So far, you still have to make the adjustments manually in the source code (or go deep into reprogramming your web editor). However, I am sure that future web editors will also be able to automatically generate xhtml code, provided that it prevails.

3. Specification of a public DTD

Not only XML, but also XHTML and even XML use DTDs, i.e. document type definitions. While these can be created in XML yourself (type: System) and then only apply to the assigned documents, XHTML and XML use so-called public DTDs (type Public), which are nothing other than the standard definition of the markup language used. These are DTDs generally known (public) and therefore do not have to be supplied (in contrast to self-defined system DTDs in XML); the reference is sufficient. The following DTD example from HTML 3.0. You will always find these lines as the first line in the header of your html document (before the html tag is opened). If you generate html code by hand, you should include these lines, otherwise the browser may have display difficulties (but this is rather rare, it then uses the flexible, so-called transitional DTDs:

From HTML 3.0 to 3.2 the change of the responsible "supervisory authority" took place: from IETF to W3C.

This means that HTML 4.0 must be strictly adhered to, all tags not defined there are to be ignored.

This transitional DTD says that the document corresponds in principle to the html 4.0 standard, but also allows the use of older tags, which should be implemented (rendered).

Incidentally, this website uses the following DTD (according to the default settings of the GoLive5 web editor used, which received a corresponding update):

The new standard XHTML 1.0 is called like this:

Reference is made immediately to the location of the corresponding DTD on the web. If the browser does not know the DTD, it can look there.

If you already want to use xhtml, However, I recommend a combination of transitional html 4 and xhtml 1so that there are no display errors:

When using framesets, a different xhtml-DTD should be specified (should be standardized in the future):

If you want, you can also add a so-called "Namespace" specify (xmlns, means xml namespace). This brings you very close to xml, since html formatting within xml documents has to run in such a namespace. Below is a corresponding one Example of a complete such xhtml document from the book by Hess & Karl:

phew ... (no comment)

Example of the XML syntax

now the promised "system" for the system, an example explained in part, are based on templates from the Hess & Karl book.

XML example:

Above is a complete xml document, which should explain that xml alone only controls the structuring of documents.

It starts with the specification of the xml version, the syntax with the ? must be observed.

The DTD used is then defined (in red). This is created internally here, it could also have been outsourced and linked. It is important that it is a DTD created by us and not a public DTD. We call them contact, this is also defined as a so-called root element. We then create a sub-element entry within the root element with its contents name, telephone and place of residence. The contents of the sub-element Entry should all be of the type text, this is regulated by the specification #PCDATA (parced character data). However, this does not yet say anything about how the text should be presented (see below). You can also create your own content types or use others (you can see that this is where the secret lies, where and how, for example, the content of a sub-element is defined as text, graphics, as a film or whatever).

Since we want to regulate that the root element can be used multiple times, we put a corresponding instruction (in purple) above its definition, which regulates that this element can be used multiple times.

This ends the DTD definition and we can write the actual xml document. As you can see, it can only use the elements defined in the DTD as "tags" (now elements).

It should be emphasized again: the DTD can be much more complex, it can also be exported as an extra file and several DTDs can also be imported into an xml document; this is all very similar to external stylesheets or external scripts in html documents.

What happens now if you type in the above, add the ending .xml and call it up in xml-capable browsers? Quite disappointing, but understandable. Let's take a look at the result before we explain it:

Representation in Netscape Navigator 6:

Navigator has recognized the xml document as such (do a faulty code and you will be confronted with a corresponding error message) and has even interpreted the elements marked with PCDATA as text. However, completely unformatted and without break.

Representation in Internet Explorer 5.0 for Mac:

We can't do anything with the Internet Explorer result. It only displayed the code, but structured it in terms of color and indentations. He didn't even show the DTD. Of course, that's not what we want to achieve.

Why is it? Well, here it becomes very clear that xml is just a structuring language. We now have to use other options to display our xml file at all. We choose Cascading Style Sheets and connect to an external StyleSheet file. So our xml page code has to look like this:

The reference to the external stylesheet file is made with the green line above.

Our style sheet has the following code (created in a simplified way with GoLive) and was saved under the name. It is in the same directory as our xml file.

The Representation in NN6 then looks like this:

The principle should have become clear to you. As you can see, ours have Elements to be formatted in the style sheet have the same names as the xml elements. However, we would like to have the representation among each other. The best way to do this is with html tags. This should also be possible in connection with xml. html name islands create. Compare with the corresponding explanation above for xhtml.

In the code below I have included a corresponding html name island. It starts with and ends with The html tags must then be preceded by html, e.g.. Empty tags must be designed like this:.

At the bottom, a paragraph should be inserted between the two contact entries in an html name island. The corresponding code is highlighted in blue. So that the paragraph is also clearly visible, I have added another.

Our complete document code now looks like this:

- Beginning of the document ---


- end of document ---

But now there is some frustration. Our html name island is not displayed correctly with most current browsers:

So it looks under Netscape Navigator 6.0 for Mac and NN 6.2 for PC or Opera 5.0 for Mac out:

The break is not accomplished, but at least the word hello is played between the blocks without an error message. However, this also works without any html name island, simply by inserting the word hello.

And how does it look under Internet Explorer 5.0 out?

Frustration with IE 5.0 for Mac: Nothing at all is recognized, not even the style sheet, only the entire code (with the exception of the DTD) is displayed.

I was just about to give up in frustration, but then I still did Internet Explorer 5.0 for PC tried it and lo and behold, everything is displayed as we wanted it to be, so:

But you can see from the examples (and that's why I have listed them all) that there is still a long way to go to correctly implement xml in current browsers. Even if all browsers should be able to do this with the next generation, all older browsers are left out again. XML for websites will probably only be used for intranet solutions (there you know which browsers are used) and for special projects (with very complex JavaScript browser switches upstream).

The corresponding You can call up an xml example with name island here, maybe you already have a browser that shows this correctly. But be careful: browsers that cannot handle xml are in great danger of crashing when calling this link (e.g. IE 5.5 for MacOSX always crashes; NN 7 for MacOSX has no problems)

Literature on the subject:

The story about the web and html is based, among other things, on my first two web books, which are still expensive to me:

Bob LeVitus & Jeff Evans (1995): Webmaster (Macintosh): How to build your own WorldWideWeb Server without really trying.- AP-Professional, Boston etc.

Robert Tolksdorf (1995): The language of the web: HTML 3.- dpunkt-Verlag. (still exists, in an updated version)

The explanations for XML and XHTML are based in particular on the corresponding sections in the following book:

Uwe Hess & Günter Karl (2000): XHTML 1.0.- bhv-Taschenbuch, ¤ 15.29

There are also other books on XML, e.g. from the inexpensive bhv series, which I have not worked through.

The corresponding bhv book comes from the author of the highly acclaimed JavaScript book in the same series, so I assume that he will also give you a very clear introduction to xml: M. Seeboerger-Weichselbaum: Das Einsteigerseminar XML, 3rd edition, 432 pages, bhv-Verlag, ¤ 9.95.

I also refer again to the xml chapter in Stefan Münzer's Selfhtml course: http://selfhtml.teamone.de/xml/intro.htm