Aside: That's not to say that the differences were insignificant, but merely that the differences are transparent to the assembler. Here's a summary of some of the known differences:Because there's little software distinction between the different models in a software sense, I'll generally just refer to the "AP-101" rather than specifying "AP-101S" or "AP-101B". Realize, though, that there were also models of the IBM AP-101 computer other than the AP-101B or AP-101S, for uses other than the Shuttle, and there's no particular reason to believe that ASM101S could assemble their source code without some updates.
Feature
AP-101B
AP-101S
Power:
780W
560W
Weight:
117 pounds
64 pounds
Memory:
104K words
(416K bytes)
256K words
(1024K bytes)
Memory Protection
(per 16-bit half-word):
1 parity bit and
1 store-protect bit
6 ECC bits and
3 store-protect bits
Speed:
420K operations/second
>1000K operations/second
Battery backup:
n/a
Rechargeable NiCAD
Built-in test equipment:
n/a
Temperature; charger; battery; soft error counter
MTBF:
5K hours
24K hours
"The Master Sequence Controller (MSC) is a micro programmed computer specifically tailored for I/O Management within the Space Shuttle General Purpose Computer (GPC). As such, it has extensive and programmable capabilities for monitoring and controlling the basic I/O operations performed by upwards to 24 Bus Control Elements (BCE's) which are implemented in the baseline GPC. These capabilities include setting up, scheduling, and initiating BCE programs, monitoring the status of BCD operations, and communicating overall completion of these operations to the CPU."MSC instructions can be distinguished from CPU instructions in that they begin with the character "@".
Similarly, the POO tells us that
"The Bus Control Element (BCE) is a microprogrammed controller specifically tailored for management of I/O traffic on one of the Space Shuttle system busses. Within each IOP [Input/Output Processor] there is one BCE for each system bus, for a total of 24 BCE's. Each of these BCE's is capable of independent program execution, data buffering to and from memory, and communication with the MSC."Or in other words, besides the MSC discussed in the preceding section, there are 24 additional processors within the AP-101, yet again with their own distinct instruction set, yet again sharing memory and intermixed in the assembly-language source code with CPU instructions and MSC instructions.
BCE instructions can be distinguished from CPU and MSC
instructions in that they begin with the character "#".
See Appendix
III of the AP-101S POO or the seemingly-identical Part
III of the IOP POO for more information.
(To be clear, ASM101S is not entirely
functional. I am simply documenting it as I proceed
with development. Therefore, some of what's described
might not be available yet, though of course where I show
specific examples of output from the assembler, it is functional
enough to produce the results shown.)
Some of the more-significant availableASM101S.py [OPTIONS] SOURCE1.asm SOURCE2.asm ...
OPTIONS
are:--library=LIBRARY
specifies a path to a folder containing files of macro
definitions needed for processing the source-code files.
You'll see more on this topic later on. Though "optional",
there are very few (if any) legacy assembly-language programs
that you'll be able to assemble without it.--sysparm=BFS
tells
the assembler that the a portion of the Space Shuttle's "backup
flight software" is being assembled. By default, the
assembler assumes instead that you're assembly a portion of the
primary flight software. If you're assembling non-Shuttle
software, it doesn't really matter which is which. Note
that the "BFS" here is literal.--object=OBJECT.obj
specifies the name of the object-code file generated by the
assembler. This this option isn't present, OBJECT
will match
the name of the last source-code file specified on the command
line.--compare=FILENAME
specifies the name of an existing assembly-listing file against
which you wish to compare the results of the current
assembly. Byte-by-byte comparison of the generated object
code is performed, and mismatches are noted directly in the
assembly-listing produced by the assembler. This feature
is most-useful when contemporary assembly listings are
available, as is the case (for example) for all of the modules
in the AP-101S runtime library of the HAL/S compiler (HAL/S-FC).OPTIONS
by using the commandThe action of the assembler is roughly the following:ASM101S.py --help
LIBRARY
folder. (More on this is in the next section.)SOURCEx.asm
files specified on the command line, in the order that they are
specified. Macro definitions should precede any
source-code files using those macros, but the ordering of macro
files among themselves is not significant.stdout
)
and if there were no fatal errors, an AP-101 object-code file
that contains the results of the assembly process.At this writing, 205 source-code files are included in the regression test, comprising around 17K lines of source code. Yes, that's a relatively small test, but I'm not responsible for the lack of surviving test material. Unfortunately, I cannot claim that ASM101S is a fast program, so this process is not as fast as I might hope, nor are status messages provided as long as the tests are successful, so you may find yourself confused that nothing appears to be happening. If you run the test instead asregressionASM101S.sh
it will at least display the names of the files it is checking.regressionASM101S.sh -v
As a more-concrete example than the abstract description in the
preceding section, consider assembly of the "ACOS" module from the
AP-101S runtime library of the HAL/S compiler. We'll learn
more details about the runtime library and what's required for
assembling it in the sections that follow. For our purposes
at the moment, I'll content myself by saying that if you follow
the installation instructions (given later) you can perform this
assembly with these steps:
cd "virtualagc/yaShuttle/Source Code/PASS.REL32V0/RUNASM"
ASM101S.py --library=../RUNMAC ACOS.asm
(Or at least, that's how you'd do it in Linux or Mac OS. In
Windows you'd have to use '\' in place of '/'.)
Here's what the AP-101S assembly-language source code for the
ACOS module looks like prior to assembly:
TITLE 'ACOS -- SINGLE PRECISION INVERSE SINE-COSINE FUNCTION' 00000100 |
Simple, right? Well ... perhaps not so much! In fact,
it's more complex than it appears at first sight. You'll
notice that strewn throughout the source code are various
"instructions" that I've highlighted in red
to make them stand out. These are actually invocations of
"macros" (more on this below), each of which may be expanded into
multiple lines of source code at assembly-time. Or perhaps
not: The INPUT
macro, for example, in this example turns out to "expand" to
nothing at all. In contrast, other of the macros may
themselves invoke other macros, which may in turn invoke other
macros, and so on. For example, the ACLOSE
macro appearing at the bottom of the listing above invokes the
macro ERRPARMS
.
The macros themselves (in this example) provide the machinery
needed to interface assembly-language subroutines to HAL/S code
calling those subroutines, but macros can serve many other
purposes in other source-code files.
Not that we're in a position to understand it fully without a lot
of study, here's what the source code for that very same macro ACLOSE
looks like:
MACRO 00000100 |
I've highlighted this entirely in red
because even though many macro definitions include actual AP-101S
instructions and pseudo-ops, this particular macro definition
consists entirely of statements in the macro language. About
the only thing here that's immediately understandable — and only
because I mentioned it earlier! — is that ERRPARMS
is the invocation of another macro.
Now that we recognize that the assembly listing of ACOS may not
look too much like the original source code, the following is an
excerpt from the assembly report produced when ASM101S
assembles ACOS. In the interest of saving a little space,
I've removed such assembler-generated items as the symbol
table. At the same time, I've highlighted the macro
expansions in red:
ACOS -- SINGLE PRECISION INVERSE SINE-COSINE FUNCTION PAGE 2 |
As I hinted earlier, one nice feature (for us!) of the HAL/S
AP-101S runtime library is that we have contemporary
assembly-listings created by the original assembler, which is of
great help in verifying that ASM101S produces correct
results. ASM101S tries to mimic the assembly
listings produced by that original assembler, but not obsessively
so. If you're interested — and if you have a really big
display or multiple monitors! — clicking this link hopefully
opens up the original assembly listing in a new window or tab
(depending on your browser's configuration), so that you can
visually compare the contemporary and new listings side by side.
When assembling an assembly-language file whose code depends on
macros, there are three different ways in which the definitions of
those macros may be made available to the code using them:
COPY
. (Note:
COPY
'd files supposedly
cannot contain macro definitions in System/360, but can do so
in AP-101.)--library=LIBRARY
.
LIBRARY
is just a
path to a macro-library folder. Macro definitions in any
libraries specified in this manner are loaded by the assembler along
with the specific source-code file(s) being assembled, thus
automatically making all of the macro definitions in that
library available during the assembly process.--library=../RUNMAC
or --library=../MLIB80
,
assuming the current working directory was the one storing the
source-code files being assembled.COPY
pseudo-ops are also located within the macro libraries,
intermixed with the files intended to contain only macros. But
we do not want any of the code from these COPY
'able
files (even if there are some macro definitions within them) to be
automatically be made available during assembly. Rather, we
want their code to be made available only when they're COPY
'd! Or to put it
differently, categories #2 and #3 of files containing macro
definitions, as discussed above, must be mutually exclusive.COPY
(category #3).ASM101S does not attempt to determine these distinctions
for itself. Rather, the files in the macro library (or
libraries) must have been preprocessed in such a manner as to
determine which of the two categories each file in the library
falls into. Each macro library is assumed to contain a file
called MACROFILES.txt containing this information, and ASM101S
simply uses the categorization provided by MACROFILES.txt.
The format of MACROFILES.txt is that it lists the names all of the
macro-definition files, one per line. Full-line comments
(having a semicolon in column 1) are also allowed.
Aside: A utility program (makeMACROFILES.py) is provided to create MACROFILES.txt. Admittedly, insofar as legacy code related to Shuttle flight software is concerned, this is probably of little interest to you, the end user, since all such preprocessing is likely to have been performed prior to you seeing any of the assembly-language source-code files anyway. But if you do happen to acquire flight software or other AP-101 software from sources other than Virtual AGC — send it to me! — then I suppose you might need to do the preprocessing yourself.
For Linux, Mac OS, or Windows. If the HAL/S compiler (HALSFC)
has been installed per the instructions, then ASM101S
will automatically be available as well.
If for some inexplicable reason you want to have ASM101S
just for itself, without the HAL/S assembler (or any of the AP-101
source-code files) provided by the normal installation, you could
instead just download the file ASM101S.py.
You simply need Python 3 to run it.
Aside: If you choose the latter installation method, I can only assume that you already have some AP-101 source-code files that you want to assemble. You might consider sending them to me.
AP-101 assembly-language source-code files can also be obtained
somewhat indirectly by the trick of compiling HAL/S source-code
files, and then extracting assembly language from the reports
produced by the HAL/S compiler's code-generation pass. The page covering
the HAL/S compiler shows you some examples of how to run the
HAL/S compiler, but to summarize it briefly, suppose you have a
HAL/S source-code file called SOURCE.hal in the current
working directory. To compile it, you might use the command
HALSFC SOURCE.hal "" "LIST"
This operation creates a new folder containing the results of the
compilation, as well as all of the intermediate files HALSFC
creates during the compilation process. This folder will be
the newest one with a name of the form "*.results", such as
"HALSFC Wed Aug 21 07:14:36 AM CDT 2024.results". The file
that's of interest to us in that folder will be the one called
"pass2.rpt".
The pass2.rpt file cannot be directly assembled by ASM101S,
because it contains a lot of stuff other than just
assembly-language source code. But a script called
"extractAP101S.py" has been provided that can extract just the
AP-101S assembly language from pass2.rpt into a file that can
indeed be directly assembled:
extractAP101S.py <pass2.rpt >SOURCE.asm
ASM101S SOURCE.asm
It happens that pass2.rpt itself is structured much like the
assembly listings produced by ASM101S, in that it includes
(among other things) not only the source code but also the binary
form of the object code and some tables. This similarity can
be exploited as an additional validity test for ASM101S,
if you're as inclined to doubt its validity as I am.
Given that the connection between the AP-101 assembly language
and the System/360 assembly language is undocumented (in surviving
documentation) and is based only upon my own inferences, it's not
surprising that there are some discrepancies between theory and
practice, or between what I've implemented in ASM101S vs
what's documented for IBM 360 assembly language. I'll
explain those differences in the subsections below.
By an "assembly listing", I mean a printout from the assembler
itself, typically showing how each line of source code has been
transformed into binary codes, and providing useful extra
information such as symbol tables and other cross
references. You've seen an example (for the ACOS module)
earlier.
Unfortunately, in spite of the claims to the contrary I've made
in earlier sections, there are no surviving assembly listings
produced by the AP-101S original assembler that I'm aware of, or
even substantial fragments of such listings. (If you notice
any, be sure to call my attention to them!) Therefore,
without any of the original assembly listings to mimic, assembly
listings as produced by ASM101S are unlikely to match
those of the original assembler with exactitude .... though of
course I expect the same binary codes to be produced at the
same addresses, since if not, then the entire exercise of creating
ASM101S in the first place would be pointless. But
even if I had such original assembly listings, one wouldn't expect
them to be any guide as to the wording or format of warning or
error messages produced by the assembler, since any Space Shuttle
flight software source code available for assembly presumably
would be error-free, at least to the point that no warning or
error messages are likely to appear in any assembly listings.
With that said, there is some assembly-listing-like material
available, and those are what I have referred to in
earlier sections. Among the files presently publicly
visible, I refer to the
folder called RUNLST in our source-code repository, which
naively appears to be assembly listings generated by assembling
the files in the
repository's RUNASM folder. RUNASM contains the
AP-101S assembly language source code, in conjunction with the
macro library folder RUNMAC, and assisted by the
interface-file folder ZCONASM, for the runtime library used
with AP-101S object code created by the HAL/S compiler, HAL/S-FC.
Upon close inspection, however, the contents of RUNLST cannot
actually have been produced directly by the original AP-101S
assembler. And similarly for materials not presently
publicly visible. I assume, rather, that listings produced
by the original assembler were stored somehow, probably in a
so-called partitioned data set (PDS), and that the listings in
RUNLST were produced by running some kind of report generator on
those stored listings. Here's a fragment of the listing
RUNLST/ACOS that we've seen earlier:
.
.
.
28 ACOS AMAIN ACALL=YES 00002200
29+***********************************************************************
30+*
31+* PRIMARY ENTRY POINT
32+*
33+***********************************************************************
00000 34+ACOS CSECT 01-AMAIN
00000 35+STACK DSECT 01-AMAIN
36+* DS 18H STANDARD STACK AREA DEFINITION
00000 37+ DS F PSW (LEFT HALF) 01-AMAIN
00002 38+ DS 2F R0,R1 01-AMAIN
00006 39+ARG2 DS F R2 01-AMAIN
00008 40+ DS F R3 01-AMAIN
0000A 41+ARG4 DS F R4 01-AMAIN
0000C 42+ARG5 DS F R5 01-AMAIN
0000E 43+ARG6 DS F R6 01-AMAIN
00010 44+ARG7 DS F R7 01-AMAIN
45+* END OF STANDARD STACK AREA
00012 46+SAVE6 DS D TO SAVE REGISTERS F6,F7 02-00025
00016 47+SWITCH DS F TO SAVE R4 ACROSS INTRINSIC CALL 02-00026
00018 48+STACKEND DS 0F END OF COMBINED STACK AREA 01-AMAIN
00000 49+ACOS CSECT 01-AMAIN
0000000 50+ USING STACK,0 ADDRESS STACK AREA 01-AMAIN
00000 E0FB 0018 0018 51+ IAL 0,STACKEND-STACK SET STACK SIZE 01-AMAIN
00002 B624 0000 0009 0000 52+ NIST 9(0),0 CLEAR ON ERROR INFO (LCL DATA PTR) 01-AMAIN
54 *COMPUTES ARC-COSINE(X) OF SINGLE PRECISION SCALAR 00002300
55 INPUT F0 SCALAR SP 00002400
0000000 56+F0 EQU 0 01-INPUT
58 OUTPUT F0 SCALAR SP RADIANS 00002500
.
.
.
To anybody who is familiar with assembly language, this certainly
looks like an assembly listing produced by an assembler, so
why do I say that it's not? The first clue is the line
numbering: There's a line 52
and a line 54, but no line 53.
And there are lines 56 and 58, but no line 57. Admittedly,
it's not 100% certain why that is, but having tried to track it
down, it appears to me that both of those gaps correspond to uses
of the SPACE
pseudo-op
appearing in expansions of the AMAIN
and INPUT
macros
respectively. According
to the assembly-language manual, "The SPACE instruction is
used to insert one or more blank lines in the listing." And
if so, where are the blank lines that should have been inserted?
Another clue, not apparent from the fragment above, is in the
number of lines per page of the printout. Originally, an
assembly listing would have been output to a line printer having
(nominally) ~55 lines per page. Whereas the file in RUNLST
have about 80 lines per page. Nor are there any embedded
form-feed characters or other means to advance to the top of the
next page before a page heading is printed. So I would again
infer that these are not the original assembly listings. (On
the other hand, I wasn't there, and I don't really know what
printers were available to the developers, so arguing merely from
the number of lines on the page isn't as conclusive as I might
like.)
Regardless, the files of RUNLST are the best guide available as
to the format of assembly listings, and hence ASM101S
mimics that format to the extent feasible (i.e., to the extent not
too pathetically obsessive), plus the addition of convenient
indications of page breaks.
So when I refer throughout this webpage
existing/surviving/legacy/contemporary "assembly listings", keep
in mind that I'm referring to legacy reports such as those in
RUNLST and not literally to assembly listings produced by the
original assembler.
The AP-101 character set does not match that of the System/360
assembler. The latter is the EBCDIC character set, or rather
the variation of EBCDIC listed in Appendix A of the System/360
assembly-language manual.
On the other hand, as far as I can tell, the AP-101
assembly-language character set is not defined anywhere, and can
only be inferred indirectly.
Examining character strings appearing in object files output by
the HAL/S compiler HAL/S-FC, I find that quoted strings in
HAL/S are in fact encoded in the ASCII-like character set of the
Space Shuttle's Display Electronics Unit (DEU). Whereas all
other text in HAL/S source code, such as symbol names, is encoded
in EBCDIC. There are no corresponding surviving object files
produced by assembly (rather than compilation), but my tentative
operating principle is that all text in object files produced by
the assembler is encoded in EBCDIC. For more explanation, as
well as a listing of the DEU character set and its encoding, see the
discussion of character encoding on the HALLINK101S page.
In IBM 360 Basic Assembly Language (BAL), various aliases exist
for the branch instructions BCR
and BC
. These are described
in Figure
4-1 of the assembler-language manual. While it is
tempting to say that Figure 4-1 should be accepted as-is for
AP-101S assembly language, that's unfortunately impossible:
Conditional-branch instructions encode a "mask" to be applied to
the CPU's condition codes, but the mask is 4 bits wide for
System/360 and only 3 bits wide for AP-101S.
Something has to give! But Figure 4-1 does serve as a
starting point for reverse-engineering AP-101 aliases for
conditional-branch instructions. Here's my own list of
AP-101S mnemonics for these branch instructions, grouped by
condition-code mask. Except where otherwise indicated,
they're all aliased to the BC
instruction; but where marked in parentheses, they're instead
aliased to BCR
or BVC
.
NOP
,NOPR(BCR)
— No Operation.BH
,BO
,BP
—
Branch on High, Branch Over, Branch on PlusBL
,BM
,BN
— Branch on Low, Branch on Minus, Branch on NegativeBNE
,BNZ
— Branch on Not Equal,
Branch on Not ZeroBE,BZ
— Branch on
Equal, Branch on ZeroBNL,
BNM
,BHE,BNN
— Branch on Not Low, Branch on Not Less Than, Branch on Higher
or Equal, Branch on Not MinusBNH
,BNP
,BLE
,BNO,BNC
(BVC)
— Branch on Not High, Branch on Not Plus, Branch on
Less-or-Equal, Branch Not Over, Branch on No Carry.B
,BR(BCR)
— Unconditional
BranchNote: While the mnemonics and condition masks in the list above are accurate (I hope!), the textual descriptions are less certain and should be taken with a grain of salt.
Aside: While I call this "aliasing toBC
", in analogy to System/360, that's not exactly what's going on in AP-101S. In fact, in AP-101S each of these branching instructions (exceptNOPR
andBR
) are encoded as one of three different instructions (BC
,
BCB
, orBCF
), with the particular instruction chosen being dependent on the direction and distance of the branch being attempted. Similarly, aBCT
instruction (branch on count) may instead generate the shorter machine codeBCTB
(branch on count backward) for short enough loops.
LHI
: Besides the
branch-instruction aliases, Shuttle flight-software code uses the
operator LHI
, but
without any AP-101 instruction or any macro definition
corresponding to it. There is such an instruction in IBM 360
assembly language. The
AP-101S POO notes in its discussion of the LA
instruction that there is
a particular configuration of operands for which LA
will be "functionally
equivalent to a LOAD HALFWORD IMMEDIATE instruction". My
guess is that the original assembler therefore accepted the
mnemonic LHI
but
silently transformed it in the appropriate LA
instruction. ASM101S treats it in that manner as
well.
SHI
: Similarly,
flight software uses the non-existent SHI
instruction. The program comments at those points clearly
indicate that this is a kind of subtract-immediate instruction,
presumably Subtract Halfword Immediate. Unlike the case of LHI
, there is no
corresponding SHI
instruction for System 360. Nevertheless, we might suppose
that the case is still similar, in that this could be an alias for
(perhaps) a particular configuration of operands for some other
AP-101 instruction. Fortunately, we have plenty of examples
of assembly listings for code using SHI
.
Consider this example:
B0E5 FFFE SHI R5,2
The value 0xFFFE is a halfword with the value -2, which leaves us
to suspect that this is actually an addition. There is
indeed an Add
Halfword Immediate instruction (AHI
),
and "AHI R5,-2
" would
indeed assemble as shown.
LACR
: There is no
corresponding System/360 instruction to guide our thinking.
However, there are lots of examples in AP-101 assembly listings,
such as those for the CTOI.txt file of the HAL/S-FC runtime
library. LACR
is
seen to be a register-to-register operation. For (say)
general-registers N and M, it assembles to the bit
pattern 11101nnn 11101mmm. This is the same
pattern that the LOAD ARITHMETIC COMPLEMENT (LCR
)
instruction assembles to. Therefore, LACR
is nothing more than a synonym for LCR
.
PC
: Similarly,
this undocumented instruction is found from available assembly
listings to assembly as a synonym for MVH
(move halfword). There's no rationale obvious to me for the
specific mnemonic "PC" for this operation.
Not all pseudo-ops described in the System/360 assembler manual
appear in surviving AP-101 assembly-language source code.
I've chosen to believe that rather than the omissions being
coincidental, those pseudo-ops are instead specific to System/360
and thus had been entirely omitted from AP-101
assembly-language. Admittedly, that inference is probably
wrong in the case of certain of the pseudo-ops.
Nevertheless, they have not been implemented in ASM101S.
The omitted pseudo-ops are:
Obviously, this list is subject to change, if legacy AP-101 assembly-language source code using any of these pseudo-ops is discovered.
The SPOFF
and SPON
pseudo-ops — if they are
pseudo-ops — seem typically to be used in pairs: SPOFF
is used to disable
something unknown, then an instruction or two later, SPON
is used to re-enable
whatever it was that SPOFF
disabled. They are not pseudo-ops in IBM 360 assembly
language, and hence must be specific to AP-101S.
Fortunately, we have a few contemporary assembly listings in
which these pseudo-ops appear in the source code, and thus their
effect can be observed somewhat. They do not generate any
binary, hence they are definitely not instructions of any
kind. Furthermore, they do not affect whether or not the
source code they enclose is assembled, nor whether that source
code appears in the assembly listing.
I would tentatively conclude that at least for the moment they
can simply be ignored, and that's what ASM101S does with
them for now.
COPY
'd FilesThe
System/360 assembler manual tells us that assembly-language
files included in other assembly-language files via the COPY
pseudo-op cannot
contain various other pseudo-ops, two of which are MACRO
and MEND
. That implies
that a COPY
'd file
cannot contain any macro definitions. Nevertheless, Space
Shuttle flight software has file inclusions that violate this
restriction. Specifically, the files MLIB80/MACSMITH.asm and
MLIB80/MACROS.asm do contain macro definitions, and yet are
themselves COPY
'd into
other assembly-language files. Consequently, this
restriction (at least insofar as MACRO
and MEND
are concerned)
does not apply in AP-101 assembly-language.
The
assembler manual tells us that
"The macro instruction prototype statement (hereafter called the prototype statement) specifies the mnemonic operation code and the format of all macro instructions that refer to the macro definition. It must be the second statement of every macro definition."For example, in a macro definition such as
no other statements must appear between the first two lines shown here.MACRO
MYMACRO &ARG1,&ARG2
.
.
.
MEND
I guess we'd infer from this, and very reasonably, that comments are not "statements", but more importantly, that the macro prototype is not necessarily the second line in a macro definition.MACRO
.* THIS IS A COMMENT
.* THIS IS ANOTHER COMMENT
.
.
.
.* THERE WERE A WHOLE LOT OF COMMENTS, SEE?
MYMACRO &ARG1,&ARG2
.
.
.
MEND
Aside: I don't know if anybody will read these words, ever, but my sixth sense tells me that some folks who do might be smugly saying to themselves right now that "of course full-line comments are not 'statements' in any language, so what's this fool on about?" As it happens, on p. 69 of the assembler manual, we find a section actually entitled "Comments Statements", which proceeds to define the term comments statement as being precisely the thing we're discussing right now.
Aside: AP-101 CPU instructions fall into 5 categories, depending on the pattern of operands they accept. These 5 categories are designated RR, RS, SRS, SI, and RI. The differences between these relate to the number of operands and the means of addressing them, but the specifics aren't important for our discussion here.
All AP-101 CPU instructions of type RS can optionally have
suffixes "@", "#", or "@#" added to their mnemonics. For
example, just as there is an SCAL
instruction of type RS, there are also SCAL@
,
SCAL#
, and SCAL@#
instructions of type
RS.
To be picky about it, this usage is indeed documented, but it
took me so long to figure out that I thought I should take
explicit notice of it here anyway.
And in case it's not obvious to you what the POO means by "indirect addressing" and/or "autoindexing", there is much greater detail in the POO's explanation of the general characteristics of RS-type instructions, though you won't be any wiser about the origin or rationale of the term "autoindexing" after reading the explanation than you have beforehand."... [@] [#] indicates that the use of indirect addressing and/or autoindexing is optional. For example, [instruction mnemonic]
M
specifies direct addressing without autoindexing, whileM#
specifies direct addressing with autoindexing."
Aside: As usual, I suppose, the implications of this are more complex than might be supposed at first glance. Not to mention probably being more than you want to know! Technically, if the@
suffix is present, a bit-field known as "IA" in the encoded machine instruction is set to 1 (vs 0 in the absence of the suffix). Similarly, a bit-field known as "I" in the encoded instruction is set to 1 in the presence of the#
suffix. The "more-complex" implication is that the "I" field may be set to 1 sometimes even in the absence of the#
suffix. This happens, for example, in a branch backward to an earlier address (vs a branch forward to a later address). In such a case, the assembler sets the "I" bit on its own, without the suffix#
, because one of the special addressing modes I alluded to earlier is the case where you have an RS-type instruction
in whichOPCODE R1,D2(X2,B2)
If so, then at execution time the displacement
- The index register (
X2
) being used is general register 0 (or absent from the operand)
- And the base register (
B2
) being used is general register 3 (of absent from the operand)
- And the "IA" bit-field is 0 (i.e.,
OPCODE
has no suffix@
)
- And the "I" bit-field is 1
D2
is subtracted rather than added to the updated instruction counter. In particular, it allows backward branches from the current location.
LB
is used in the
same manner as the instruction LA
(load address), but it is a mystery what it signifies. I
presently have no examples of assembly listings containing it
from which I can deduce anything. One possibility is that
it's simply an error never detected during Shuttle
software development because the only known usage is in the
definition of a macro (LD
)
which in fact is never invoked by any of the other source
code. In other words, perhaps ASM101S shouldn't
have been getting uppity by trying to parse that macro in the
first place!Aside: "BNF", of course, stands for Backus-Naur form. Technically, the grammars are actually written in the modified EBNF (Extended Backus-Naur form) supported by the TatSu parser module for the Python language. See the Python source-code file fieldParser.py for the grammars themselves.Nevertheless, even having adding this level of complexity to the parser, it's not necessarily the case that the syntax parsed by ASM101S matches that parsed by the original assembler. For example, arithmetic expressions as specified by the System/360 assembly-language manual are constrained in various ways — e.g., cannot begin with '+' or '-', cannot have have more than 16 terms, cannot have more than 5 levels of parentheses —, but have not been endowed with the same constraints in ASM101S. On the other hand, I haven't necessarily bothered to implement theoretically-possible syntax that isn't present in actual flight software. Consequently, it's likely that ASM101S accepts a more-complex syntax in some contexts than did the original assembler, and vice-versa. Or course, ASM101S can be upgraded as needed to support such missing syntax, if it turns out to be desirable, whereas the original assembler cannot.
The AP-101 CPU has 8 general registers, typically referred to
symbolically in assembly language as R0
through R7
, as well as
8 floating-point registers, typically referred to as F0
through F7
. This is the same
situation as in System/360 assembly language, except that in
System/360 there are more of each kind of register. For
example, an assembly-language instruction that performs an integer
addition from register R7
to register R3
would
look like this in either of the two assembly languages:
But there's a catch. The assembly-language manual explains thatAR R3,R7
"All symbols that specify register numbers ... must be assumed to be equated elsewhere to absolute values."In other words, the register-name symbols
R3
and R7
in this example
are not tokens or syntactical elements of the assembly
language, and the pure syntax for the instruction example shown
above should actually be this:The only reason that the former instruction would be accepted by the assembler, the manual is explaining, is that the full example should have read something like this:AR 3,7
In turn, this means that in the macro libraries loaded by the assembler, we should should find variousR3 EQU 3
R7 EQU 7
.
.
.
AR R3,R7
EQU
ates
similar to the ones above, for the general registers and
floating-point registers. And indeed, for the macro libraries
used for the Space Shuttle primary flight software (PASS), and
backup flight software (BFS), we find exactly such declarations in
the PASS module MLIB80/MACSMITH or the BFS module MLIB80/EQU, along
with numerous other EQU
ates
of a similar nature:Unfortunately, that's not the full story. Besides the flight software as such, AP-101 assembly-language files also exists in the runtime library provided by HAL/S-FC, the HAL/S compiler. Those assembly-language files reference the CPU general registers and floating-point registers just as any of the flight-software files do, except that there are no.
.
.
F0 EQU 0 FP 0 = FLOATING POINT REGISTER
F1 EQU 1 1
F2 EQU 2 2
F3 EQU 3 3
F4 EQU 4 4
F5 EQU 5 5
F6 EQU 6 6
F7 EQU 7 7
G0 EQU 0 SET 1 GR 0 = GENERAL REGISTER
G1 EQU 1 1
G2 EQU 2 2
G3 EQU 3 3
G4 EQU 4 4
G5 EQU 5 5
G6 EQU 6 6
G7 EQU 7 7
R0 EQU 0 SET 2 GR 0 = GENERAL REGISTER
R1 EQU 1 1
R2 EQU 2 2
R3 EQU 3 3
R4 EQU 4 4
R5 EQU 5 5
R6 EQU 6 6
R7 EQU 7 7
.
.
.
EQU
ates for those
registers in any of those source-code files, nor in the macro
library used by those files.EQU
ates are missing is that
our HAL/S-FC runtime-library source code is
incomplete. Unfortunately, there is no way to know whether
that is correct or not. Another possibility is that the
System/360 assembly-language manual is incorrect, and that the
assembler does by default recognize the general registers Rn
and float-point
registers F
n
,
and possibly other symbols, without explicit EQU
ates.EQU
ates,
if such are encountered.T'
&A
)
with the notation T'
returns an assembly-time string consisting of a single character
that corresponds to the type of data the variable contains.
For example, if &A
were a character-string variable as declared via the GBLC
or LCLC
pseudo-op, then the assembler's preprocessor would replace T'&A
by the single
character C
at
assembly-time.It isn't entirely clear to me what # indicates. My current
very tentative interpretation is this:
D'
AttributeAP-101S assembly-language source code uses an attribute operator
D'
, which is not defined
in the assembly-language manual. From the way it is used, I
infer when applied to an identifier, it returns "true" (1) if the
identifier has been previously defined within the source-code
being assembled and "false" (0) if not. A typical usage
would be something like
AIF (D'MYSYM).OKAY
EXTRN MYSYM
OKAY ...
Thus if the identifier is not defined, it allows the code to
detect that condition and to mark the identifier as being declared
externally.
AIF
and AGO
AIF
and AGO
pseudo-ops provide "goto"
functionality (respectively conditionally or unconditionally) at
assembly time (rather than at runtime). The
System/360 assembly-language manual makes it clear that these
"goto" operations can operate only with the same macro depth, and
further, if within a macro, only within the same
macro. For example, in the "pseudo-instruction" the locations of the pseudo-instruction itself and of the sequence symbolAGO .MYSEQ
.MYSEQ
could be
both outside of any macro, or they could be within the same macro
definition. But it could not be the case (say) that
the pseudo-instruction was within a macro definition and the
sequence symbol was within a macro invoked by that macro.COPY
pseudo-op. Is it possible for the AGO
or AIF
pseudo-instruction
to be in a file containing a COPY
pseudo-op while the target sequence symbol is in the file being COPY
'd? Or vice-versa?COPY
, any AGO
/AIF
pseudo-instruction and its target sequence symbol must reside within
the same COPY
'd file.Certain arithmetical quirks are inherent in System/360 assembly
language, and I must presume that these peculiarities carry over
into AP-101S assembly language as well. Therefore, ASM101S
retains these peculiarities rather than eliminating them.
The peculiarities I regard as worth noting are these:
&A
the value -5
via a
line of pseudo-code such as "&A
SETA -5
", because "-5
"
is neither a legal literal nor a legal arithmetic
expression. (Nor would +5
be legal.) The
assembly-language manual seems to advise using workarounds
such as "&A SETA 0-5
"./
" operator is integer
division. For example, 5/2 evaluates to 2. The
manual does not explain what the result of an operation like
(0-5)/2 would be; either -3 or -2 is plausible. Until the
correct behavior is somehow determined, ASM101S uses the
Python convention (i.e., it uses the Python //
operator), which would
result in -3 in this case.+
, -
,
*
, or /
. ("Terms" is
quoted here to distinguish the System/360 usage from the normal
mathematical usage, in which terms are added or
subtracted from to/from each other, while factors are
multiplied or divided to/by each other.) These operations
are performed in left-to-right order, except that
multiplications or divisions are performed prior to additions or
subtractions. In particular, in an expression like 3*5/2,
division does not have a higher precedence than multiplication,
so it is evaluated as (3*5)/2 rather than as 3*(5/2).On the other hand, ASM101S does remove some of the
constraints of System/360 assembly-language arithmetical
restrictions, namely:
Aside: Regarding peculiarities of my own making, as opposed to those of the language itself or the original assembler, I'm obliged to admit that I don't quite understand how to perfectly handle assembly-time evaluation of arithmetic expressions involving program labels: i.e., involving the addresses of symbols rather than the values of constants.
To do so, ASM101S instead uses an imperfect trick, making use of the facts that the address space of the AP-101S is limited to 24 bits and that the number of allowed control sections in a program (at least in System/360) is limited to 255. The addresses of program labels (prior to linking) is precisely an ordered pair of the form (control section, offset into control section), but performing arithmetical computations is easiest when these values can somehow be converted to single numbers rather than ordered pairs. The trick is to assign each control section a unique but randomized 64-bit value whose least-significant 24 bits are all 0, and to convert addresses of symbols to a sums of these 64-bit values plus 24-bit offsets into the control sections. (I don't mean that the codes for the symbols are actually random, but rather that they are selected in a way that makes it unlikely to produce their values by common types of calculations.) In this way, calculations like
SYMBOL+OFFSET
orSYMBOL1-SYMBOL2
(for symbols in the same section) produce the expected results, and indeed, produces correct results for all correct expressions. Unfortunately it remains possible to combine symbols in an incorrect manner from two different control sections and get a result that appears to be in yet a third control section, which is incorrect. This potential is part of the reason for using 64-bit pseudo-addresses (and distributing the unique numerical codes for the control sections throughout a 40-bit space) rather than 32-bit pseudo-addresses (and distributing the unique numerical codes in an 8-bit space): It reduces to a very low level the probability of producing "fake" control sections in calculations.
According to the System/360 assembly-language manual, although
EXTRN
symbols can appear in expressions, they cannot be paired. This implies, I think, that they can be handled interoperably with the description in the preceding paragraph, by using unique but randomized 64-bit values with the lower 24 bits all 0 in place of those symbols.
I thought at first that the same trick could be used to handle calculations other not-yet-defined symbols. Unfortunately, such an attempt would be guaranteed to produce incorrect results in calculations like
KNOWN-UNKNOWN
, even ifKNOWN
andUNKNOWN
both turned out to be members of the same control section. Therefore, the addresses of all symbols in the current file must be ascertained in a separate pass before computations of expressions involving such symbols are performed.
Among the types of expressions computed by the assembler at
assembly-time for use with pseudo-ops such as SETB
or AIF
are the boolean
expressions, of which one sub-type is relational expressions
involving string values.
A relational expression is used to determine that two values
(either two numbers or two strings) are equal (EQ
), not-equal (NE
), less-than (LT
), less-than-or-equal (LE
), greater-than (GT
), or
greater-than-or-equal-to (GE
)
each other. For example, the relational expression
returns the value "true" (which in System/360 assembly language is numerically equivalent to 1) since 3 is less than 4.3 LT 4
returns "true".'Z' LT 'AA'
Thus in the end we really don't know what collation sequence is
appropriate. ASM101S temporarily pretends that the
collation sequence is ASCII since that's the easiest to implement.
Character expressions consist of text delimited by single-quotes,
as for example 'HELLO'
,
plus various additional flourishes that you can read about in the
System/360 assembly-language manual but which I won't bother to
rehash here.
This means that the substring to be extracted begins at'HELLO'(start,length)
index
start and is length
characters in
width.Before describing the specific AP-101S versus System/360 issue
associated with the items known as "SET symbols", let me summarize
some of what the System/360 assembly-language manual has to say
about them.
In System/360 assembly language there is the concept of symbols
relevant only to the assembler in a preprocessing pass, in
distinction to symbols representing addresses in the runtime
memory of the assembled program. These symbols are
distinguished in that their names are prefixed by the character
'&'. Thus MYVAR
might be a variable representing a memory location, whose contents
can be modified by the assembly-language program when it is run,
while &MYVAR
might
represent an assembly-time variable, assigned a value that can be
manipulated during the assembly process, but that is not known or
modifiable by the assembled program.
Here, we're concerned only by the latter category, namely the SET
symbols.
SET symbols can be categorized another way, namely by their
datatypes, which cannot be changed once established. The
three types are:
Yet a third way that they can be characterized is as:
GBLA
,
GBLB
, GBLC
, LCLA
,
LCLB
, or LCLC
. Any of these
instructions also assigns an initial value the symbol, either 0,
False (0), or '' (empty string), depending on the datatype.
For example, the instruction "LCLB
&BOO
" declares a local boolean SET symbol
called &BOO
and
assigns it the default value False (numerically, 0).SETA
,
SETB
, or SETC
.SETA
,
SETB
, or SETC
(or used in other
manners) without any declaration via GBLA
,
GBLB
, GBLC
, LCLA
,
LCLB
, or LCLC
whatsoever (prior or
otherwise), which is a possibility denied by the System/360
assembly-language manual.What are we to make of this?MACRO 00000100
INPUT &X 00000200
GBLA &ENTCNT 00000300
GBLB &INPUT(20),&LIB 00000400
AIF (N'&SYSLIST EQ 0).EMPTY 00000500
&INPUT(&ENTCNT) SETB 1 00000600
AIF ('&X' EQ 'NONE').SPACE 00000700
&I SETA 1 00000800
&LAST SETA N'&SYSLIST 00000900
.LOOP AIF (K'&SYSLIST(&I) NE 2).BADREG 00001000
&R SETC '&SYSLIST(&I)' 00001100
AIF ('&R'(1,1) NE 'F' AND '&R'(1,1) NE 'R').BADREG 00001200
AIF ('&R' EQ 'R0').BADREG 00001300
AIF (&LIB AND ('&R' EQ 'R1' OR '&R' EQ 'R3')).INVREG1 00001400
AIF (NOT &LIB AND '&R' EQ 'R4').INVREG2 00001500
AIF (D'&R).NEXT 00001600
&N SETC '&R'(2,1) 00001700
&R EQU &N 00001800
.NEXT ANOP 00001900
&I SETA &I+1 00002000
AIF (&I LE &LAST).LOOP 00002100
.SPACE SPACE 00002110
MEXIT 00002200
.BADREG MNOTE 4,' ILLEGAL REGISTER SPECIFICATION - &SYSLIST(&I)' 00002300
AGO .NEXT 00002400
.INVREG1 MNOTE 4,'&R INVALID INPUT FOR PROCEDURE ROUTINE' 00002500
AGO .NEXT 00002600
.INVREG2 MNOTE 4,'R4 INVALID INPUT FOR INTRINSIC' 00002700
AGO .NEXT 00002800
.EMPTY MNOTE 4,'OPERAND REQUIRED' 00002900
MEND 00003000
When a variable that has not previously been explicitly declared (byGBLx
orLCLx
) is the target of a
SETx
instruction, it is declared automatically by the assembler as if via
LCLx
.
Aside: If this inference is correct, it might seem naively that there's no need for the instructionsLCLA
,LCLB
, orLCLC
at all, since aSETA
,SETB
, orSETC
could always be used instead. Upon closer inspection that's not true, sinceLCLx
(likeGBLx
) can additionally be used to declare SET symbols as arrays, which aSETx
instruction with this convenience feature could not. And even in the non-arrayed case, there are certainly instances in existing code in whichLCLx
is indeed used explicitly even though the described convenience feature would not require it. For example, consider this macro from the AP-101S runtime-library source code, which unlike the problematic macro listed above corresponds exactly to the System/360 assembly-language manual's pronouncements:
As for the origin of such a convenience feature in the first place, I'd note that in addition to being "convenient", the complexity of some AP-101S macros could make some of those macros very difficult or impossible to implement otherwise. According to System/360 rules, allMACRO 00000100
&NAME AERROR &NUM,&GROUP=4 00000200
GBLA &ERRCNT,&ERRNUMS(10),&ERRGRPS(10) 00000300
LCLA &I 00000400
AIF (&NUM GT 62).BADNUM 00000500
&I SETA &ERRCNT 00000600
.DUPLOOP AIF (&I LE 0).NEWERR 00000700
AIF (&NUM EQ &ERRNUMS(&I) AND &GROUP EQ &ERRGRPS(&I)).DUP 00000800
&I SETA &I-1 00000900
AGO .DUPLOOP 00001000
.NEWERR ANOP 00001100
&ERRCNT SETA &ERRCNT+1 00001200
&I SETA &ERRCNT 00001300
&ERRNUMS(&I) SETA &NUM 00001400
&ERRGRPS(&I) SETA &GROUP 00001500
.DUP ANOP 00001600
*********ISSUE SEND ERROR SVC****************************************** 00001700
&NAME SVC AERROR&I ISSUE SEND ERROR SVC 00001800
*********SEND ERROR SVC RETURNS CONTROL FOR STANDARD FIXUP************* 00001900
MEXIT 00002000
.BADNUM MNOTE 12,'ERROR NUMBER GREATER THAN 62' 00002100
MEND 00002200GBLx
andLCLx
instructions must appear not merely beforeSETx
instructions involving the SET symbols they declare, but indeed prior to everything else. For example,GBLx
instructions must appear immediately after the prototype line of a macro definition, with nothing intervening except comments, whileLCLx
instructions in turn must appear immediately after that. Thus if a macro definition depends on the flexibility of allowing a SET symbol to be declared in alternate ways under different circumstances, such as arrayed vs non-arrayed or integer vs character, the rules of the System/360 assembler likely would not allow it because alternate declarations could appear in the prescribed location. Whereas the rules of implicit declaration viaSETx
instructions basically allow non-arrayed local declarations to appear anywhere. So the convenience feature of implicit declaration, if it truly exists, could have arisen from necessity rather than from a desire for mere convenience. Not that "mere" convenience is to be sneered at. But that's just speculation on my part, with the answer lost in the mists of time past.
DC
and DS
Pseudo-Op Formats, and "Literals"The
System/360 assembly-language manual describes a
quite-complex format for the operands of the DS
and DC
pseudo-ops used for
allocating or initializing data memory. (The description
takes about 11 pages, which is over 6% of the manual.)
However, I see no point in implementing those features of this
format which are not actually used in Space Shuttle
flight-software source code. At present, I believe that the
following features of the DC
/DS
format do not
need to be supported in ASM101S:
The System/360 manual uses the term literal to refer to
an operand (for instructions) in a manner differently than I have
been doing (and differently from what I think is current common
usage). I have been using the term to describe strings of
characters such as 1234
or 'HELLO WORLD'
, or
perhaps X'3F7C'
.
System/360 (and presumably AP-101S) assembly language don't
consider these to be "literals". Rather, the following would
be considered "literals": =F'1234'
,
=C'HELLO WORLD'
, or =
X'3F7C'
.
The distinction, aside from prefixed equals sign and other
syntactic elements, is that the former are used directly by DC
pseudo-ops or in some
cases coded into instructions, whereas the latter are instead
assembled into special areas of memory known as "literal pools"
and only their addresses are coded into instructions.
Regarding the=B'...' Binary
=C'...' Character
=D'...' Double-precision floating point
=E'...' Single-precision floating point
=F'...' Fullword fixed-point
=H'...' Halfword fixed-point
=X'...' Hexadecimal
=Y(...) Nearby address
=Z(...) Remote address
=F'...'
and
=H'...'
datatypes, see here
to understand the significant differences between how the AP-101S
and System/360 assemblers treat them. Ln
", where n
is a decimal number
indicating the number of bytes of memory
allocated. Similarly, for fixed-point datatypes
(i.e., =F
and =H
) the "scale modifier" ("Sn
") is
supported. If both are present, the scale modifier must follow
the length modifier. Aside: The System/360 manual also describes variations on the allowed modifiers, such as "While ASM101S supports these length and scale modifiers, the length modifier in AP-101S assembly language does not appear to be used as described in System/360. Consider this AP-101S instruction, which appears in the MSTR module of the HAL/S-FC runtime-library:L(e)
" (wheree
is an arithmetic expression), scale modifiers for floating-point datatypes (=E
and=D
), an "exponent modifier" for fixed-point and floating-point datatypes, and so on. ASM101S doesn't support any of these variations since they don't appear in Shuttle flight software source code.
00013 27F7 0011 0026 0011 57 N R7,=XL2'F' 00002200The length modifier "
.
.
.
00024 75+ LTORG 02-ERRPA
00024 FFFF000F 76 =X'FFFF000F'
00026 0000000F 77 =XL2'F'
00028 FFFFFFF0 78 =X'FFFFFFF0'
L2
"
in the literal =XL2'F'
clearly indicates that even though the provided constant value (F
) is only one hexadecimal
digit, it must assemble to 2 bytes, i.e. to hexadecimal 000F.
And yet, in reality, we find that it has assembled to 4 bytes, as
hex 0000000F. In point of fact, since an =X
literal by default
assembles to an even number of bytes (according to the System/360
manual), the length modifier should not even have been necessary in
the first place, since it would merely be explicitly expressing the
default.=FS32'60E6'
superficially appears to assemble to 60000000 (decimal, i.e.
0x03938700), but because of the scale modifier it is actually
shifted rightward by 32 bit-positions, i.e. divided by 232.
So the value it assembles to is instead approximately
0.013969839. But that still doesn't mean that everything is
exactly as in System/360. In AP-101S, the fixed-point literals
have the interesting property (see the link given above about =F
and =H
datatype differences from System/360) that if they appear to be
integers, then they assemble into 2's-complement integers, while if
they appear to be fractional values with absolute value less than
1.0 they instead assemble into an alternate format maximizing the
significance in that numerical range. That's what happens in
this example. The literal assembles to 0x01C9C380, which
amusingly, is just 60000000/2.=Z(...)
format in
AP-101S is unrelated to the =Z'...'
("zoned decimal format") of System/360, which is not supported by ASM101S
and is best forgotten entirely. The =Z(...)
of AP-101S has three parameters, thusly:Naturally, as you may expect,=Z(ARG1,ARG2,ARG3)
=Z(...)
is entirely undocumented. It is used in precisely one place,
namely in the FCMNNIT module of each available version of Shuttle
flight software. In flight software OI-30.17, that usage looks
like so in the assembly listing produced by the original assembler:434 EXTRN FPMXQETB,FPMXQELE,FCMALLFS 026120BQSince the symbol
.
.
.
000DE EAF3 0000 0000 438 LA R2,FPMXQETB DESTINATION - START OF XQE TABLE 026140BQ
000E0 E2FB 001E 001E 439 IAL R2,FPMXQECT*2 30 HWS TO INIT - (15 ENTRIES) 026145BQ
000E2 1BF7 00EE 01D2 00EE 440 L R3,=Z(,FPMXQETB+2,0) SOURCE ZCON ADDRESS 026150BQ
.
.
.
001D2 734 LTORG 050300BG
001D2 00020000 735 =Z(,FPMXQETB+2,0)
FPMXQETB
is declared as an EXTRN
,
its address and other characteristics are unknown at assembly time,
though in fact FPMXQETB
is the start of something called the "XQE table", aligned at a
fullword address, a fact presumably known to the human programmer if
not to the assembler program.=Z(...)
is
used to form "ZCON" addresses. The ZCON compiler option for
the HAL/S compiler (HAL/S-FC), in the words of the "HAL/S-FC
User's Manual", has the effect of "[causing] calls to
out-of-line routines (external references) to be performed via long
indirect address constants". It appears to me that the term
"ZCON" probably stands for "Z constant", as a contrast to "YCON" for
constants formed via =Y(...)
,
which are just 16-bit displacements. The distinction is that
LOCAL data (in the HAL/S sense) can be accessed by efficient YCON
addressing, whereas REMOTE data (again in the HAL/S sense) is
accessed via less-efficient ZCON addressing.=Z(ARG1,ARG2,ARG3)
is perhaps better represented as =Z(TBD,BASE,DISPLACEMENT)
. FPMXQETB+2
is the BASE
address, though
the external symbol is simply assembled as having address 0, since
its address is unknown to the assembler (later to be fixed up by the
linker) causing FPMXQETB+2
to assembled as just 2. Similarly, the DISPLACEMENT
is 0.Aside: I don't claim that all of the mysteries ofThe "=Z(...)
have been solved by the meandering inferences above They do explain, more or less, how the assembler should turn=Z(,FPMXQETB+2,0)
into0002 0000
. But what happens after the object code leaves the assembler and is processed by the linker? By process of elimination, the assembly-listing excerpts above came from flight software version OI-30.17, because those are the only contemporary assembly listings available to me. But the associated source code is identical in flight software version OI-34.06, for which I have no contemporary assembly listing, but for which I do have a contemporary disassembly of the linked object code, thus giving us some clues about how the linker treats these=Z(...)
literals. What we find in the linked OI-34.06 is thatFPMXQETB
ends up at address 008B6A, and that the=Z(,FPMXQETB+2,0)
constant itself becomes 8B6C 0001. The upper halfword is precisely what we would have expected! But what about the lower halfword? Why has the linker turned 0000 into 0001? That's a mystery to me. In case you're interested, here's what the relevant portions of that contemporary disassembly of the FCMNINIT module in OI-34.06 look like:
008B6A FCMCBLKS+0A50 FPMXQETB DS 0F
.
.
.
018A52-018A53 FCMNINIT+00DE EAF3 8B6A 008B6A LA R2,X'8B6A' FPMXQETB
018A54-018A55 FCMNINIT+00E0 E2FB 001E IAL R2,X'001E'
018A56-018A57 FCMNINIT+00E2 1BF7 00F6 018B4E L R3,X'00F6' =Z''
.
.
.
018B4E-018B4F FCMNINIT+01DA 8B6C 0001 DC Z
=E
" (single-precision
floating-point) and "=D
"
(double-precision floating-point) datatypes also require a few
additional words of explanation. While literals (or constants)
of these types are provided in AP-101S assembly-language source-code
in the usual decimal notation, they are encoded into object code by
the assembler in "IBM hexadecimal floating-point" format ("IBM
hex"). Regarding this conversion of decimal to IBM Hex, the
System/360 assembly-language manual has this to say on the
subject (emphasis mine):"The number is converted to a binary number, and scaling is performed if specified. The binary number is then rounded and assembled into the proper field, according to the specified or implied length. The resulting number will not differ from the exact value by more than one in the last place."In other words, the conversion algorithm used by the original assembler was not necessarily exact to the apparent full precision. Consequently, object code containing floating-point constants as produced by ASM101S cannot be guaranteed to byte-for-byte match the floating-point constants generated by the original assembler. This potential for inexact conversions is exacerbated by the fact that the native floating-point precision in the Python 3 language in which ASM101S is written does not precisely match the native floating-point precision of System/360; Python 3 is more accurate than System/360 in some cases, and less accurate in others, I believe. ASM101S works around this latter problem by performing the floating-point operations for such conversions at a higher precision than normal Python floating-point operations, namely 20 significant digits rather than 16. Empirically, the conversions by ASM101S match the conversions provided by the original assembler in all known cases, but there's no guarantee for presently-unknown cases. Nor is the range of the two floating-point formats (in terms of powers-of-ten exponentiation) identical; ASM101S makes no attempt to work around this range mismatch, because there are no offending examples in existing legacy source code.
Aside: Expressed in different terms, ASM101S (presumably!) does not use exactly the same algorithm as the original assembler to perform the conversion from decimal strings to IBM hex. That's because I don't presently know what that original algorithm was. Thus ASM101S simply uses a conversion algorithm of my own concoction. Perhaps at some point the original algorithm may be deduced or recovered somehow. If that happens, then it can be incorporated into ASM101S, in which case none of the concerns I've been describing will obtain.We might be tempted to disregard some potential precision error in the 32nd or 64th bit as being far too small to matter. The reason it matters is because it affects validation of the assembler in the following way: ASM101S has the capability of performing byte-for-byte comparisons of new assemblies vs legacy assemblies (where available), and thus discrepancies in even the least-significant bit could be flagged as errors. Our (or at least my) criterion for accepting ASM101S as valid is that these automatic comparisons detect no error. Even if we are prepared to accept such discrepancies as being all right, they nevertheless defeat automated validation.
I'd also note that while we do not have the source code of the original AP-101S assembler, the System/360 assembler presumably used the same conversion algorithm, and probably even the same source code to implement the conversion algorithm. I'm told that the source code for one version of the System/360 assembler is online, though I don't provide it in the Virtual AGC library due (in what's probably an excess of paranoia) to copyright concerns. Someone sufficiently immersed in System/360 assembly language might be able to abstract the conversion algorithm from examination of that source code. I looked at the System/360 source code myself and concluded, alas, that I am not such a person. If you want to try it yourself and give me the algorithm in easily-understood pseudo-code (or better yet, in Python 3), the System/360 assembler's source code is the file AS037F1.TXT, supposedly present somewhere in the archive at this hyperlink, though I've been unable to find it there myself. Good luck!
DC
pseudo-ops, whereas (of course!) the entire value is available to ASM101S.
Consequently, even if the compiler reports a 100%-successful
automated comparison, generated data not printed in the assembly
listings has not been checked and may not match.AP-101S instructions are of 5 basic types, designated (by IBM) as
RR, RS, SRS, RI, and SI, based on the syntax patterns of their
operands and on the ways they are encoded as machine
instructions. Some of these are System/360 patterns, and
some are not. I won't bore you with the details, as you can
read about them in the
AP-101S Principles of Operation. However, there is a
certain difficulty with SRS- and RS-type instructions — as well as
ambiguities between two different flavors of RS-type instructions,
namely "extended" (AM=0) vs "indexed" (AM=1) instructions — that
could in principle cause a mismatch between object code
generated by ASM101S vs the original AP-101S assembler,
though hopefully not any behavioral difference at runtime other
than slight timing discrepancies. This group of instructions
includes, among other things, all conditional-branch instructions and
their aliases.
The greatest difficulty relates to the fact that certain
instruction mnemonics are used both for SRS-type
instructions and RS-type instructions. Moreover, while some
of the operand patterns for them are accepted for SRS instructions
and not RS instructions, thus allowing the assembler to
distinguish between them, some of the operand patterns
nevertheless overlap. In case of overlap, there is no
syntactic way for the assembler to distinguish between the SRS
instruction and the RS instruction. Overlap occurs for the
following syntactical patterns (where R1
,
D2
, and B2
refer to the names of
fields in the encoded machine instruction):
OPCODE R1,D2
OPCODE R1,D2(B2)
The vulnerable opcode mnemonics are:
While the SRS-type and RS-type instructions are (almost) behaviorally identical, they are encoded differently as machine instructions, and in particular require different amounts of memory to do so. SRS-type instructions are encoded as half-words (2 bytes), while RS-type instructions are encoded as full words (4 bytes). For example, there is no syntactical way to know whether to encode the load-instruction "A AE AH BC C CH D DE IAL L LA LE LH M ME MH N O S SE SH SHW ST STH TD TH X ZH
L
4,SWITCH
" as 2 bytes or as 4 bytes. So if ASM101S
were to encode an instruction as SRS while the original assembler
were to encode it as RS, or vice-versa, then not only would the
binary forms of those particular instructions differ, but all of the
code following that instruction in the same control section would be
aligned differently.Aside: I say that the SRS-types instruction and RS-type instructions are "almost" behaviorally identical. Figure 2-2 of the AP-101S Principles of Operation tells us about one difference. When an SRS-type instruction specifies a base register (B2
) equal to 3, it means to really use general-register 3, as one would expect. Whereas when an RS-type instruction specifies base register 3, it means instead to use "no" base register.
What does it mean to use "no" base register? Several pages later, we're told that "When B2 equals 11, base addressing is not performed. In this case, the displacement is instead used directly as the effective address." In other words, the displacement is the number of halfwords, within the same CSECT, from the instruction following the RS instruction to the target location.
As I said, there is no syntactic way for the assembler to
distinguish between these cases, but there is a non-syntactic way
based on the size of the D2
sub-operand. If D2
is in the numerical range 0-55, then the SRS instruction could
potentially be used, while if D2
is 56 or greater, the RS instruction must be used.
Unfortunately, determining these displacements between two
locations is quite tricky, because there might be some of these
SRS/RS instructions with ambiguous sizes in between. In
fact, for forward references, there is guaranteed to be at least
one such intervening instruction. In other words, we don't
know the sizes of the displacements until we know the sizes of the
intervening instructions, and we don't know the sizes of the
intervening instructions until we know the sizes of all of the
displacements.
In so far as the ambiguity between RS-type "extended"
instructions vs "indexed" instructions, this is a lesser problem
than the SRS-vs-RS problem, because both extended and indexed
varieties of the instruction assemble to a fullword, and thus the
use of the wrong variation of the instruction results only in a
mismatch at that exact memory location rather than a general
displacement of all the succeeding memory locations. As with
the SRS-vs-RS instructions, there are cases in which the variation
in syntax does allow distinguishing the two cases. Roughly
speaking, the syntax of extended vs indexed RS-type instructions
is
OPCODE R1,D2(B2)
OPCODE[@][#] R1,D2(X2,B2)
but since the @
, #
, and X2
are all optional
syntactically, in their absence there's no way to distinguish
between extended vs indexed instructions other than the numerical
range allowed for the displacement D2
.
As well, the latter (indexed) of the two cases allows the syntax
OPCODE R1,D2(X2)
for an even greater potential for confusion. Yay! Fortunately, allowed base registers (
B2
)
for AP-101S are only the CPU general registers 0-3, so if general
register 4-7 appear in such an instruction, it's clear that index
register X2
is meant
rather than base register B2
.Although unstated in the original documentation, I believe that
the original AP-101S assembler filled uninitialized memory with
the halfword pattern 0xC9FB. This happens to be the first
halfword of the AP-101S instruction SVC
,
which as the AP-101S POO explains (p. 9-16), "causes an
interruption and a program status word switch". (The second
halfword is the operand for SVC
.)
Whether the consistent use of this particular value is significant
or merely a coincidence, I can't say; perhaps the intention was to
use this to trap unintentional execution from uninitialized
memory.
The reason I suspect that this is the case is that in surviving
legacy assembly reports, 0xC9FB is inevitably at locations where
there's a gap due to forcing alignment of data to particular
boundaries, such as aligning fullword constants to fullword
addresses. For example, consider this excerpt from the
assembly report for the ACOS module:
.
.
.
0005D 58E0 182 RET0 SER F0,F0 00010504
0005E DF4E 004C 0013 183 B EXIT 00010600
184 * 00010700
0005F C9FB
00060 185 DS 0F 00010800
00060 413243F7 186 PI DC X'413243F7' PI 00010900
00062 411921FB 187 PIOV2 DC X'411921FB' PI/2 00011000
.
.
.
Here, the instruction B EXIT
at halfword address
0x0005F is immediately followed by the pseudo-op DS
,
whose purpose is to align to the next fullword address boundary
(i.e., the next even halfword). In the absence of
realignment, though, the next halfword address after the branch
instruction would have been 0x0005F (which is an odd halfword
address). Therefore, the assembler inserts a halfword at
0x0005F, so that DS can appear instead at 0x00060 (which is now an
even address address). The value of that halfword is 0xC9FB.
AP-101S instructions of type RS (such as LH
,
STH
, etc.) accept an
operand syntax of which the most-general form is:
OPCODE R1,D2(X2,B2)
where R1
represents a
general-purpose register designated as "operand #1", D2
represents a
"displacement" (which may take the form of a number or a program
label), X2
represents a
general-purpose register (designated as the "index register"), and
B2
represents yet
another general-purpose register (designated as the "base
register").
That said, there is a bewildering number of exceptions,
restrictions, and rules for interpreting these different
elements. Additionally, each RS-type instruction can be
assembled into two separate forms of machine instructions, namely
the so-called "extended" and "indexed" forms, as well as sometimes
into a third form, the
so-called SRS form discussed a couple of sections ago.
It's not my intention to explain all of these details — I don't
understand quite a few of them myself, anyway! —, and most of them
are explained in the AP-101S POO anyway.
But by saying that most of syntactical variations are
explained by the POO, at least in spirit, I'm also implying that
some of them are not ... and I'd like to supply those
missing explanations that pertain to the operation of the
assembler. In particular, most of the four syntactical
elements mentioned above may be omitted under various
circumstances.
Before getting to that, another thing you need to know is that
many of these RS-type instructions have related instructions in
which the characters "@", "#", or "@#" are suffixed to the
mnemonic. For example, I mentioned above that LH
and STH
are RS-type
instructions, but so too are LH@
,
LH#
, LH@#
, STH@
,
STH#
, and STH@#
. The AP-101S POO
explains these variations as follows:
OPCODE
(without @ or
#) specifies "direct addressing without autoindexing".OPCODE@
specifies
"indirect addressing".OPCODE#
specifies
direct addressing with autoindexing.OPCODE@#
specifies
indirect addressing with autoindexing.The omissions of syntactical elements which I'm concerned about
here are those of the X2
or B2
elements,
resulting in operands like
OPCODE R1,D2(B2)
OPCODE@ R1,D2(X2)
Insofar as assembly is concerned, the question that arises is
this: If you have an operand syntax like R1,D2(register)
, is register
supposed to
be an index register, or is it supposed to be a base
register? My inference is that it is supposed to be a base
register if there is no @ or # suffix on the mnemonic, but it is
supposed to be an index register for the @/#/@# forms of the
mnemonic. In the case where register
is a base register and no index register is specified, the
"extended" form of machine instruction is generated (and it
requires no index register); whereas in the case where register
is an index
register and no base register is specified, the "indexed" form of
machine instruction is generated and the base register defaults to
general-purpose register 0.
By "fixed-point" data, I'm referring to data specified in
operands of DC pseudo-ops, à la
DC F'12.345' (fullword fixed-point)
DC H'-6.12' (halfword fixed-point)
Here we have a case in which the usage in AP-101S assembly
language, per the AP-101S POO, is entirely at odds with the usage
in System/360 BAL, per the System/360 assembly-language
manual. In System/360, the non-integer portions of these
specifications are discarded (although optional "exponent factors"
and "scale factors" can be used to apply binary or decimal shifts
to the data before doing so, thus preserving as much significance
in the fractional part as may be desired).
For AP-101S, on the other hand, there are two very different
cases. First, if the data does not contain a decimal point
or exponent, then what is generated for it in memory is simply the
4-byte or 2-byte 2's-complement representation of the integer
value.
Second, if the data does contain a decimal point or exponent,
then the full value is entirely fractional: i.e., >-1.0
and <1.0. The constant is assembled to a binary value
basically by multiplying by 231 (in the case of F'...'
) or by 215
(in the case of H'...'
),
discarding the fractional portion, and representing the remaining
integer in 2's-complement form.
For example,
DC H'0.625'
generates 0.625×215 = 20480 = 0x5000.
It's TBD what should happen if there is a non-zero integer
portion, but ASM101S simply caps the generated value to
the boundaries of the representable range, which essentially means
that any integer portion is transparently dropped.
The AP-101S assembler performed a kind of partial linking of the object code, prior to any linking of separately-compiled modules by the AP-101 link editor. See the discussion on the HALLINK101S page.
This section won't be of interest to anybody who merely wishes to
use ASM101S. But given that I haven't
necessarily been able so far to provide support in ASM101S
for every feature of AP-101 assembly language that might be
discovered if additional legacy code becomes available in the
future, ASM101S may need to be maintained in the
future. So it may be worthwhile to provide at least a few
notes on how ASM101S is structured internally, in order to
facilitate that possible maintenance.
The Python 3 source code for ASM101S is kept in the
ASM101S/ folder of the Virtual AGC software tree, and the
top-level source-code file is itself called ASM101S.py.
There are also a number of additional Python files in that folder
which are imported as modules into ASM101S.py or into each
other. I should note that there are other Python files in
that folder that I find useful, but that are not used as modules
in ASM101S.
Assembly proceeds in a sequence of "passes", some of which are
designated as passes in the source code, and some of which are
not. Here's a brief runthrough of the passes:
source
;
there is an entry in source
for each line of code
or macro definition encountered; those entries are Python
dictionaries containing various information about the lines,
though some information is added to those dictionaries in later
passes. The acquisition pass includes reading the entire
macro library, handling all file-inclusions (i.e., all COPY
pseudo-ops), and resolving all assembly-language macro
operations. For example, all macro invocations are
expanded, all conditional assembly is no longer conditional, and
so on. This means that at the end of the acquisition pass,
no more symbols of the form &SOMETHING
remain, nor any "sequence symbols" (i.e., labels of the form .SOMETHING
).
In AP-101S (or System/360) assembly language, it is a difficult
technical feat to deal with lists of macro parameters (both for
macro definitions and macro invocations) that are split across
continuation cards; the acquisition pass handles this task using
TatSu (see "PASS 0" below), joining all continuations in the "operand"
field (see below), so that later passes basically ignore all
continuation cards. The acquisition pass is integrated
into the way ASM101S.py parses command-line options, and occurs
whenever an option like --library=SOMETHING
or the name of a source-code file is encountered on the
command-line. It is not marked in any way as being a
"pass". Some of the important fields in the source
entries created in this pass for use by later passes are:"text"
— the unmodified text of the original
line."name"
— the label field of the line."operator"
— the operator field of the line."operand"
— the operand+comment fields of the
line."file"
— the name of the source-code file
containing the line."lineNumber"
— the line number (starting at 1)
the line within its source-code file."n"
— the index within source
."empty"
— boolean for completely-blank lines"fullComment"
— boolean for full-line comments
("*" in column 1)"dotComment"
— boolean for dot-comments (".*"
in columns 1,2)"macro"
— name of the including macro
definition, if any"continues"
— boolean for a non-blank
continuation field (column 71)"identification"
— contents of the "ident"
field (columns 72-80)"errors"
— array of assembler-generated
error/warning messages for the line."inMacroDefinition"
— boolean for the line
being part of a macro definition."copy"
— boolean for the line being present due
to a COPY
pseudo-op"printable"
— boolean for whether to include
the line in the output assembly report."depth"
— depth of the line within nested macro
expansions. (0 means outside of any expansion.)"operand"
field in the source
array. Operand fields can be quite complex, to the point
where it is easiest to parse them by defining Backus-Naur Form
(BNF) rules for allowable operand formats, with different rules
applying for different types of operator fields. PASS 0
has the task of using these BNF rules, on an
operator-by-operator basis, for parsing the operand fields and
removing any add-on comments. The parsed operands are
stored in the source
entry in a new field,
called "ast"
. There is thus no need for any
later passes to perform additional parsing on the operand
fields. The custom BNF rules are provided in the Python
module fieldParser.py. These custom rules are compiled and
made usable by the
generally-available Python module called TatSu, which must
be installed for ASM101S to function.Some relevant Python modules of interest:
parserASM()
: Given an operand field as a
string and the name of a BNF rule, parses the operand.
See also the TatSu
documentation.joinOperand()
: Joins operand fields that
are split across continuation cards.error()
: Given a source
entry and an error message, appends the error message to the
list of error messages of the source-code line. (This
function really has nothing to do with expressions per se,
but it's ubiquitous throughout the assembler.)evalArithmeticExpression()
: Evaluates
arithmetic expressions involving symbols (either
macro-language symbols or symbols like program labels in the
symbol table) and arithmetical operations (like +, -, *,
/). Can account for symbols that are in different CSECT
or which are EXTRN.evalBooleanExpression()
evalCharacterExpression()