Double click Xemit.exe in the download folder to run it.
Click on the Options menu. Here you can choose whether you want files to be saved as Ansi or Utf-8.
The encoding used in a Gutenberg file should be specified somewhere in the preface to the text. For English-language files this will normally be iso-8859-1, which is understood by just about all browsers. In addition to letters, numbers and punctuation it contains other common characters (for example accented letters such as é or è, and symbols such as ÷). Use Ansi if the source file encoding is iso-8859-1.
If the file contains characters outside the set (you may come across the odd Greek quotation in an 18th or 19th century novel) you could choose Ansi and replace the characters outside the set with entities
For a list of characters which are part of the Ansi character set but do not appear in iso-8859-1 - these MUST be replaced with entities! - have a look at this page. The most likely one you will encounter is the long dash (em dash) which is covered later.
Utf-8 is becoming the international standard for web pages, since it contains just about any character or symbol used in just about any language. However, some older browsers have problems with it, so it may be safer to stick with Ansi.
When Xemit opens a file, any tab characters are converted to either two or four spaces (four if the Double indent option on the Options menu is checked, otherwise two). Pressing the tab key allows you to indent blocks of text by the same number of spaces.
Click on File > Load tags. Browse to the download folder and open tags.txt. The contents of this file will appear in the right-hand window.
These are the tags I find most useful when converting text files to html. However, you can add or remove tags by editing the tags.txt file. The w3c website describes the different types of html, and the markup which can be used with each.
Next time you open Xemit the file containing your tags will load automatically.
Open example.txt in the download folder. This is a fairly typical Gutenberg text, though I've edited it slightly to make it more interesting. It consists of the preface and the first four chapters of Thomas Hardy's novel "Far From The Madding Crowd".
Before you do anything else, save it as FFMC.html. Then, if you inadvertently click on Save, you won't overwrite the source file. It'll still be there if you need it.
Html files must not contain < or > characters. It's also best to avoid &. These characters have a special meaning for a brower, and should be replaced by the entities < > & Click on Clean > Replace characters to change them.
If no text is selected, most actions (including Replace characters) process the whole file. They can be undone by pressing f2. Pressing f2 again will restore the changes.
There are descriptive versions of entities: for example & instead of &. While these are easier to understand when viewing the source file, avoid using them. You may need to carry out case-sensitive searches later, and the lower case characters in descriptive entities could cause problems.
Gutenberg e-texts use leading and trailing _underscores_ to indicate text which should be emphasized. Click on Clean > Replace underscores to convert these to <em> tags. If all are replaced, but there are unequal numbers of opening and closing <em> tags, a warning will appear. You will then have to go through the file and decide what has gone wrong (if you see twenty paragraphs of emphasized text, that's normally a good place to start!)
If any underscores can't be replaced, the first of them will be highlighted. Fix the problem, then use f3 to find the next one. In this case there is only one; just replace it with <em>.
Gutenberg uses -- to signify a long hyphen (an em dash). To change -- to — (—) click Clean > Replace double hyphens.
Paragraphs in Gutenberg text files consist of a lot of short lines, which are supposed to be easier to read. In addition, lines such as titles may contain a number of leading spaces to centralize the text.
Browsers render double spaces between words as single spaces, while leading spaces are ignored. So there's not much point in including these extra spaces in your html file. To remove them from the whole file, make sure nothing is highlighted then click Clean > Remove spaces.
You may however wish to keep the alternating indentation of lines of poetry. For example:
There are some heights in Wessex, shaped as if by a kindly hand
For thinking, dreaming, dying on, and at crises when I stand,
Say, on Ingpen Beacon eastward, or on Wylls-Neck westwardly,
I seem where I was before my birth, and after death may be.
To preserve indentation, highlight the block of text (make sure you include all leading spaces, including those at the start of the first line) and press ctrl-shift spacebar. This will convert them to   Once this has been done you can click on Clean > Remove spaces to process the rest of the file.
If you want to centralize text, use CSS.
Browsers also ignore line breaks. The following:
<p>In reprinting this story for a new edition I am reminded that it was
in the chapters of "Far from the Madding Crowd," as they appeared
month by month in a popular magazine, that I first ventured to adopt
the word "Wessex" from the pages of early English history, and give
it a fictitious significance as the existing name of the district
once included in that extinct kingdom.</p>
will display as:
In reprinting this story for a new edition I am reminded that it was in the chapters of "Far from the Madding Crowd," as they appeared month by month in a popular magazine, that I first ventured to adopt the word "Wessex" from the pages of early English history, and give it a fictitious significance as the existing name of the district once included in that extinct kingdom.
The Gutenberg people suggest that when converting text files to html the line structure is retained (so that if an error is found in a particular line in the text file it will be easy to find and correct in the html version). Although keeping all those extra line breaks won't actually stop the browser displaying paragraphs as one long line of wrapped text, I find them ugly and remove them when I am creating html files for my own use. More on how to do this later.
If you are happy to retain Gutenberg's extra line breaks you don't need to do anything else to the file structure. If you want to get rid of extra blank lines between paragraphs, while leaving the breaks at the end of lines in place, click on Clean > Compress lines (try it, then press f2 to reverse the changes).
Whether you keep the Gutenberg line structure or not, you must go through the file and look for any lines which require a line break. For example:
London
July 19th 18—
Dear Sir,
If you don't want this to appear as:
London July 19th 18— Dear Sir,
you should add <br> tags as follows:
London<br>
July 19th 18—<br>
Dear Sir,
Click Search > Search for lines to look for the first line requiring a break. It is:
Full of sound and fury
and the caret should now be positioned at the end. To insert a <br> tag, double click <br> in the right-hand window.
The search function works by looking for lines that are less that 50 characters long and not followed by an empty line. Although it's fairly reliable it may miss one or two.
Press f4 to search for more lines. As it happens there are none, and you should see a message to that effect.
The <br> tag you just added should still be highlighted. Click anywhere in the file to remove the highlighting. Now click Clean > Format lines to format the file. Formatting removes extra blank lines, as well as line breaks within a block of text. Breaks at the end of lines ending in > are untouched, so the two lines of verse you marked up will remain unchanged.
Press f2 to remove the formatting.
Highlight the items in the CONTENTS list, from Preface to The Mistake, and click Clean > Collapse lines. This will convert the text to a list. Click on Clean > Remove numbers to get rid of the Roman numerals at the start of each line.
Tags can be added to a list, a group or a block.
A list consists of lines of text, one below the other. Tags are added to every line, including blank ones.
Highlight from Preface to The Mistake, then press the control and shift keys and double click <li></li> in the right-hand window. This adds tags to the list. Now press the tab key to indent the highlighted text, then press the control key and double click <ol></ol>. This adds tags above and below the highlighted block of text.
A group consists of a line or lines followed by an empty line. Tags are added at the beginning and end of the group. Highlight the four paragraphs of text in the PREFACE and double click <p></p> (don't press control or shift). All four paragraphs should now be enclosed in <p> tags.
You can also add tags to a single line. Highlight FAR FROM THE MADDING CROWD and double click <h2></h2> And you can insert an empty tag pair - click at the end of the line, then double click <h2></h2> once more. Press f2 to remove the tag pair you just inserted.
Highlight the second and third lines of text ('by' and 'THOMAS HARDY') and double click <h3><h3>. The tags are added to both lines.
Suppose you now decide you want the line "by" to be a bit smaller - <h4> instead of <h3>. You could highlight it and press ctrl-shift r to remove the tags, then double click <h4></h4>. An easier way is to highlight the line, press control and alt, and double click <h4></h4>. This will replace the <h3> tags.
Highlight 'CONTENTS' and 'PREFACE' and add <h3> tags. Then scroll down and add <h3> tags to 'T.H.' and <h4> tags to 'February 1895'.
Most novels are laid out in a fairly regular pattern. In this case the chapter heading is followed by a subtitle and a number of paragraphs of text.
Highlight 'CHAPTER I' and add <h3> tags. Then highlight 'DESCRIPTION OF FARMER OAK--AN INCIDENT' and add <h4> tags (if you've converted the double hyphens you will see it as 'DESCRIPTION OF FARMER OAK—AN INCIDENT'). Now click at the beginning of the first paragraph of text. You could press the shift key, then scroll down looking for the end of the chapter. Or you could let Xemit find it for you.
Click on Search > Define search term. Each of the chapter headings in Far From The Madding Crowd consists of the word 'CHAPTER', followed by a single space and an upper case Roman numeral. In the Search string field, type in {CHAPTER \s \C} and click Close. (For an explanation of what those symbols actually mean, see below).
If you haven't already done so, click at the beginning of the first paragraph of chapter 1. Press f5 and all text to the end of the chapter should be highlighted. Double click <p><p> to add <p> tags.
It's possible to go through the novel, adding tags chapter by chapter. However, there's a much quicker way to do this.
Click at the beginning of the words 'CHAPTER I' and press ctrl-shift end to highlight to the end of the file. Press ctrl-shift r to remove the tags you've just added. Leave the text highlighted.
The expression {CHAPTER \s \C} allows Xemit to identify lines conforming to a particular pattern. The { and } characters signify start of line and end of line, while \s and \C stand for space (one or more) and upper-case letter (one or more). It instructs the program to look for a line beginning with the word 'CHAPTER', followed by a space or spaces, ending with one or more upper-case letters.
You don't necessarily have to describe the whole line. '{CHAPTER' - that is, any line beginning with the upper case letters CHAPTER - would work in this case. However, the more information you can provide, the less chance of Xemit finding the wrong line.
The text from 'CHAPTER I' to the end of the file should still be highlighted, so click on Html > Add tags. Enter the search term {CHAPTER \s \C}, click once on <h3></h3> in the right-hand menu, then click Add. This should add tags to each line matching the search term - in other words, all chapter headings.
You can check that tags have been added to the right lines by clicking Html > Verify tags. Click on <h3></h3> again, then click Verify. A list of all lines marked up using <h3> tags should appear. 'Verify tags' checks the text selected or, if no text is highlighted, the whole file.
If any lines are included which you don't think should be there, highlight them in the Verify window and click Find. This will minimize the window and display the lines in the main text. You can then decide from the context what to do with them.
Minimize the Verify window. The next thing to do is to add <h4> tags to each of the chapter sub-headings. The first of these is 'DESCRIPTION OF FARMER OAK—AN INCIDENT'.
So how to describe this line unambiguously? Well, it consists of a mixture of upper case text, numbers and punctuation characters. It does not end in a punctuation character, and the lines before and after are blank. Type in:
{\T /p}
and check the Prev. blank and Next blank boxes. (If you check the Prev. blank box, and the first line of your selection is a match in all other respects, the program will find it. Same with the Next blank box and the last line of the selection.)
\T means anything apart from lower case characters - in other words, upper case characters, numbers, symbols and spaces. The /p before the closing } bracket means that the last character should not be a punctuation character.
When adding tags to paragraphs, you need a search term that will find just about anything. {\t} will match upper and lower case characters, numbers, symbols and spaces. The lines you have already marked up will not be included in the search, so you don't need to worry about accidentally adding more tags to them.
If you undid the formatting the first paragraph of Chapter 1 should look like this:
When Farmer Oak smiled, the corners of his mouth spread till they
were within an unimportant distance of his ears, his eyes were
reduced to chinks, and diverging wrinkles appeared round them,
extending upon his countenance like the rays in a rudimentary sketch
of the rising sun.
To add <p> tags to a paragraph which is split over several lines, check the Add to group box. This tells the program to treat consecutive lines as a group. You certainly don't want the following:
<p>When Farmer Oak smiled, the corners of his mouth spread till they</p>
<p>were within an unimportant distance of his ears, his eyes were</p>
<p>reduced to chinks, and diverging wrinkles appeared round them,</p>
<p>extending upon his countenance like the rays in a rudimentary sketch</p>
<p>of the rising sun.</p>
If tags are added to a group, the markup will look like this.
<p>When Farmer Oak smiled, the corners of his mouth spread till they
were within an unimportant distance of his ears, his eyes were
reduced to chinks, and diverging wrinkles appeared round them,
extending upon his countenance like the rays in a rudimentary sketch
of the rising sun.</p>
You should also check the Group box if you have added <br> tags to short lines. As in:
London<br>
July 19th 18—<br>
Dear Sir,
If you tell the program to treat consecutive lines as a group, you don't need to check the Prev. blank or Next blank boxes. By definition, groups of lines are separated from other text by blank lines.
If you choose to wrap lines by clicking on Clean > Format, and there are no short lines with <br> tags, you can uncheck the Add to group box, as Xemit will see each paragraph of wrapped text as one long line.
A word of warning here: don't try to verify <p> tags, as the search will take some time and will return almost the entire file you are working on! Instead, once you have finished adding tags, open the Verify window, make sure the tag field is blank and click on Verify. This will search for any lines that have not been marked up. If there are none, assume that all is well.
That's quite a long explanation, but it's important since these expressions can save a huge amount of time and effort when converting text to html. A full list of symbols used to search for lines can be found here.
It's always nice to be able to select a chapter heading from an index and go straight to it. There's a semi-automated process that allows you to add links to the file.
Scroll up to the top of the file, highlight from '<li>Preface' to 'The Mistake</li>' and remove all tags (ctrl-shift r). Leaving the list highlighted, click on Html > Add links.
The five items in the list ('Preface' plus the four chapter headings) will automatically be wrapped in <a> tags. The program then searches the file for the first occurence of the first item in the list. If this is not the one you want to link to, click Next to search further.
When you find the right one, click inside the right-hand angled bracket of the opening tag and click Create. The target to link to will be inserted at the caret. <h3>PREFACE</h3> will become <h3 id="link1">PREFACE</h3>. Once the link has been created the program will seach for the next one.
Now the text 'Description of Farmer Oak—An Incident' is highlighted. This is actually a subheading, and you probably want to link to the chapter heading instead, so click inside the right-hand angled bracket of the opening tag of <h3>CHAPTER I</h3> and click Create. Do the same for each of the chapter headings. Finally, add <li> tags to each line in the list of contents (press ctrl-shift and double click <li></li>).
If the program can't find a match it will add the words 'not found' in the Link to: field ('Description of Farmer Oak—An Incident not found'). If the search phrase is a long one, try deleting most of it, leaving four or five words (e.g. 'Description of Farmer Oak'). This generally solves the problem, but if not you may need to scroll through the file until you find the right place, position the caret appropriately and click on Create.
You can add autolinks to a list of headings separated by a blank line. The program ignores the blanks.
Once you are happy with the markup, click on Html > Add header and select one of the three doctypes. This will add opening and closing tags to your page. I always use html 4.0.1 strict, the latest version of html (see the w3c site for more on this). In the head section, type in a title between the <title></title> tags. This is normally the title of the work, plus the name of the author.
In order to comply with w3c standards I've included the character declaration:
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
If you selected utf-8 rather than Ansi in the Options menu it will look like this:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
In the head section, click between the <style type="text/css"></style> tags, then click on Html > Add stylesheet. Browse to the download folder and open styles.css. The contents of this file will be inserted into the page you are working on.
Save your work, open it in a browser and start reading!
Instead of adding opening and closing tags and styles to each page, you might want to create a template containing this information. You could then copy your markup and paste it in, add a title and save it using your own file name. There's an example in the download folder (template.html) if you want to have a look at it.
| { | Start of line |
| } | End of line |
| \t | A mixture of letters, numbers, symbols and spaces |
| \T | A mixture of upper case letters, numbers, symbols and spaces |
| \c | One or more upper or lower case letters |
| \C | One or more upper case letters |
| \p | One or more punctuation characters (i.e. any character that is not a letter, number or space) |
| \d | One or more decimal numbers |
| \s | One or more spaces |
| /C | Not a single upper case letter |
| /c | Not a single upper or lower case letter |
| /p | Not a single punctuation character |
| /d | Not a single decimal number |
| /s | Not a single space |
When the search term is processed, any spaces are removed. So {CHAPTER \s \C} is identical to {CHAPTER\s\C}.