UCSD Pascal In Depth: Text Files

Written in

by

Part 3 – File Formats (text)

Previously: Intro, Running Apple Pascal on a modern Mac, The Editor, The Filesystem.

What kind of system has a “file format” for text?

If you’ve only worked on modern systems, the idea that a “text file” would be anything other than a bunch of unorganized bytes is probably pretty foreign.

But the 1970s were a very different time. So let’s talk about the text file format for the USCD p-System. This is not just something that applies to the text editor, incidentally. If you declare a file as “text” type in Pascal, it gets the same formatting applied. The formatting is transparently stripped from the file if you send it to the PRINTER: or CONSOLE: device.

As far as I was able to determine, this format isn’t fully documented anywhere. The Apple Pascal manual and other reference works that I found treat the header as an implementation detail of the editor, rather than as documented OS functionality.

Text file header

There is a two-block “header” at the start of every text file. Which is…odd, right? A full kilobyte of overhead for every text file? Especially given the space-compression that’s applied in the text blocks (see below), it just seems out of place. And most of it is wasted, anyway…

All of the editor “environment” settings are stored in this header. This includes the indent behavior, margins, the “command” character, and the markers.

Here’s a Rust declaration for the first half of the Text File header:

#[repr(C)]
struct TextFileHeader {
    maybe_version: u16, // I don't know what this is, it seems to always be set to 1?
    marker_count: u16, // Maximum of 10 markers
    marker_labels: [[u8; 8]; 10], // An array of 10 8-character arrays, space-padded
    unknown: [u8; 10], // All zeroes, unused?
    marker_positions:[u16;10], // One file position (character offset) for each marker
    auto_indent: u16, // auto-indent enabled
    fill: u16, // Fill enabled
    token: u16, // token search on/off
    left_margin: u16, // left margin
    right_margin: u16, // right margin
    para_margin: u16, // paragraph margin
    command_char: u16, // command character
    date_created: u16, // date created
    date_last_used: u16, // date last used
    filler: [u8; 380] // reserved, all zeroes
}

The second 512-bytes of the header is all zeroes. For details on the date format used, see the Filesystem post.

Text file blocks

Text is stored in blocks of 1024 bytes. Lines are terminated with the ASCII CR (0x0d) character, and lines cannot cross a text block boundary. When a line will not fit in the remaining space in a block, the end of the block (after the last CR) is filled with NUL bytes. If you create a completely-empty text file, it takes up 4 disk blocks, two for the header, and two for the first (empty) text block.

Initial space compression

When a line in a text file starts with repeated space characters, the initial spaces may be stored as a run length. This appears as an ASCII DLE (0x10) character followed by a single character, interpreted as (32 + space count). For example, for a non-indented line, you would expect to see 0x10 0x20, or DLE followed by a Space character, at the beginning of the line. I suspect this rather-odd encoding was done so you could “forget” to fix the formatting in a dump to a printer, and it’d still come out okay, given an invisible, generally harmless control character, followed by an ASCII symbol.

There is no documentation on when this is/isn’t done, but in a test, it didn’t seem to happen for < 12 consecutive spaces, when just typing in text. If you use the Adjust feature in the editor, it does go back and compress every line that’s changed. I suspect the Margin tool would, as well.

Text files written from a Pascal program don’t get this treatment. Any spaces in text you write is preserved. But again, this is transparent from the Pascal program’s perspective. You just read and write lines, and the formatting is taken care of by the system.

Converting “UCSD text” to a “normal” text file

This is pretty straightforward, if you want to maintain the strict 80-column hard-wrapped format that the Editor enforces. All you have to do is skip the initial 2-block header, convert the CLE+count character sequences to spaces, and replace the CR line endings with your preferred line ending characters. Oh, and drop all of those embedded NUL bytes that appear in the middle of the text.

I’ve implemented that in my p-filer code, so I can now transfer text from my Apple Pascal disk images to a native MacOS text file. If I get ambitious, maybe I’ll implement a mode where it tries to combine wrapped lines that occur within a paragraph.

What’s next?

Now that I can transfer files to and from the Apple Pascal system, the next step is to build some code with the compiler, and analyze it. Luckily, the Apple Pascal Operating System Reference manual has a lot of detail about the object file format, so I shouldn’t have to reverse-engineer much of it.

Leave a comment