Aside: That's not to say that the differences were insignificant, but merely that the differences are transparent to the assembler. Here's a summary of some of the known differences:Because of this dearth of software distinction between the different models in a software sense, I'll generally just refer to the "AP-101" rather than specifying "AP-101S" or "AP-101B". Realize, though, that there were also models of the IBM AP-101 computer other than the AP-101B or AP-101S, for uses other than the Shuttle, and there's no particular reason to believe that ASM101S could assemble their source code without some updates.
Feature
AP-101B
AP-101S
Power:
780W
560W
Weight:
117 pounds
64 pounds
Memory:
104K words
(416K bytes)
256K words
(1024K bytes)
Memory Protection
(per 16-bit half-word):
1 parity bit and
1 store-protect bit
6 ECC bits and
3 store-protect bits
Speed:
420K operations/second
>1000K operations/second
Battery backup:
n/a
Rechargeable NiCAD
Built-in test equipment:
n/a
Temperature; charger; battery; soft error counter
MTBF:
5K hours
24K hours
"The Master Sequence Controller (MSC) is a micro programmed computer specifically tailored for I/O Management within the Space Shuttle General Purpose Computer (GPC). As such, it has extensive and programmable capabilities for monitoring and controlling the basic I/O operations performed by upwards to 24 Bus Control Elements (BCE's) which are implemented in the baseline GPC. These capabilities include setting up, scheduling, and initiating BCE programs, monitoring the status of BCD operations, and communicating overall completion of these operations to the CPU."MSC instructions can be distinguished from CPU instructions in that they begin with the character "@".
Similarly, the POO tells us that
"The Bus Control Element (BCE) is a microprogrammed controller specifically tailored for management of I/O traffic on one of the Space Shuttle system busses. Within each IOP [Input/Output Processor] there is one BCE for each system bus, for a total of 24 BCE's. Each of these BCE's is capable of independent program execution, data buffering to and from memory, and communication with the MSC."Or in other words, besides the MSC discussed in the preceding section, there are 24 additional processors within the AP-101, yet again with their own distinct instruction set, yet again sharing memory and intermixed in the assembly-language source code with CPU instructions and MSC instructions.
BCE instructions can be distinguished from CPU and MSC
instructions in that they begin with the character "#".
See Appendix
III of the AP-101S POO or the seemingly-identical Part
III of the IOP POO for more information.
(To be clear, ASM101S is not yet functional.
I am simply documenting it as I proceed with development!)
You can see a list of the available command-line options by using the commandASM101S.py [ --library=LIBRARY ] [ --sysparm=BFS ] --object=OBJECT.obj SOURCE1.asm SOURCE2.asm ...
Invoked in this fashion, the action of the assembler is roughly the following:ASM101S.py --help
LIBRARY
folder. (More on this is in the next section.)SOURCEx.asm
files specified on the command line, in the order that they are
specified. Macro definitions should precede any
source-code files using those macros, but the ordering of macro
files among themselves is not significant.stdout
)
and if there were no fatal errors, an AP-101 object-code file
named OBJECT
.obj
that contains the results of the assembly process.When assembling an assembly-language file whose code depends on
macros, there are three different ways in which the definitions of
those macros may be made available to the code using them:
COPY
. (Note:
COPY
'd files cannot
contain macro definitions in System/360, but can do so in
AP-101.)--library=LIBRARY
.
LIBRARY
is just a
path to a macro-library folder. Macro definitions in any
libraries specified in this manner are loaded by the assembler along
with the specific source-code file(s) being assembled, thus
automatically making all of the macro definitions in that
library available during the assembly process.--library=../RUNMAC
or --library=../MLIB80
,
assuming the current working directory was the one storing the
source-code files being assembled.COPY
pseudo-ops are also located within the macro libraries,
intermixed with the files intended to contain only macros. But
we do not want any of the code from these COPY
'able
files (even if there are some macro definitions within them) to be
automatically be made available during assembly. Rather, we
want their code to be made available only when they're COPY
'd! Or to put it
differently, categories #2 and #3 of files containing macro
definitions, as discussed above, must be mutually exclusive.COPY
(category #3).ASM101S does not attempt to determine these distinctions
for itself. Rather, the files in the macro library (or
libraries) must have been preprocessed in such a manner as to
determine which of the two categories each file in the library
falls into. Each macro library is assumed to contain a file
called MACROFILES.txt containing this information, and ASM101S
simply uses the categorization provided by MACROFILES.txt.
The format of MACROFILES.txt is that it lists the names all of the
macro-definition files, one per line. Full-line comments
(having a semicolon in column 1) are also allowed.
Aside: A utility program (makeMACROFILES.py) is provided to create MACROFILES.txt. Admittedly, insofar as legacy code related to Shuttle flight software is concerned, this is probably of little interest to you, the end user, since all such preprocessing is likely to have been performed prior to you seeing any of the assembly-language source-code files anyway. But if you do happen to acquire flight software or other AP-101 software from sources other than Virtual AGC — send it to me! — then I suppose you might need to do the preprocessing yourself.
For Linux, Mac OS, or Windows. If the HAL/S compiler (HALSFC)
has been installed per the instructions, then ASM101S
will automatically be available as well.
If for some inexplicable reason you want to have ASM101S
just for itself, without the HAL/S assembler (or any of the AP-101
source-code files) provided by the normal installation, you could
instead just download the file ASM101S.py.
You simply need Python 3 to run it.
Aside: If you choose the latter installation method, I can only assume that you already have some AP-101 source-code files that you want to assemble. You might consider sending them to me.
Given that the connection between the AP-101 assembly language
and the System/360 assembly language is undocumented (in surviving
documentation) and is based only upon my own inferences, it's not
surprising that there are some discrepancies between theory and
practice, or between what I've implemented in ASM101S vs
what's documented for IBM 360 assembly language. I'll
explain those differences in the subsections below.
By an "assembly listing", I mean a printout from the assembler
itself, typically showing how each line of source code has been
transformed into binary codes, and providing useful extra
information such as symbol tables and other cross
references.
Unfortunately, there are no surviving assembly listings produced
by the AP-101S original assembler that I'm aware of, or even
substantial fragments of such listings. (If you notice any,
be sure to call my attention to them!) Therefore, without
any of the original assembly listings to mimic, assembly listings
as produced by ASM101S are unlikely to match those of the
original assembler with exactitude .... though of course I
expect the same binary codes to be produced at the same addresses,
since if not, then the entire exercise of creating ASM101S
in the first place would be pointless. But even if I had
such original assembly listings, one wouldn't expect them to be
any guide as to the wording or format of warning or error messages
produced by the assembler, since any Space Shuttle flight software
source code available for assembly presumably would be error-free,
at least to the point that no warning or error messages are likely
to appear in any assembly listings.
With that said, there is some assembly-listing-like material
available. Among the files presently publicly visible, I
refer to the
folder called RUNLST in our source-code repository, which
naively appears to be assembly listings generated by assembling
the files in the
repository's RUNASM folder. RUNASM contains the
AP-101S assembly language source code, in conjunction with the
macro library folder RUNMAC, and assisted by the
interface-file folder ZCONASM, for the runtime library used
with AP-101S object code created by the HAL/S compiler, HAL/S-FC.
Upon closer inspection, however, the contents of RUNLST cannot
actually have been produced directly by the original AP-101S
assembler. And similarly for materials not presently
publicly visible. I assume, rather, that listings produced
by the original assembler were stored somehow, probably in a
so-called partitioned data set (PDS), and that the listings in
RUNLST were produced by running some kind of report generator on
those stored listings. Here's a fragment of the listing
RUNLST/ACOS:
.
.
.
28 ACOS AMAIN ACALL=YES 00002200
29+***********************************************************************
30+*
31+* PRIMARY ENTRY POINT
32+*
33+***********************************************************************
00000 34+ACOS CSECT 01-AMAIN
00000 35+STACK DSECT 01-AMAIN
36+* DS 18H STANDARD STACK AREA DEFINITION
00000 37+ DS F PSW (LEFT HALF) 01-AMAIN
00002 38+ DS 2F R0,R1 01-AMAIN
00006 39+ARG2 DS F R2 01-AMAIN
00008 40+ DS F R3 01-AMAIN
0000A 41+ARG4 DS F R4 01-AMAIN
0000C 42+ARG5 DS F R5 01-AMAIN
0000E 43+ARG6 DS F R6 01-AMAIN
00010 44+ARG7 DS F R7 01-AMAIN
45+* END OF STANDARD STACK AREA
00012 46+SAVE6 DS D TO SAVE REGISTERS F6,F7 02-00025
00016 47+SWITCH DS F TO SAVE R4 ACROSS INTRINSIC CALL 02-00026
00018 48+STACKEND DS 0F END OF COMBINED STACK AREA 01-AMAIN
00000 49+ACOS CSECT 01-AMAIN
0000000 50+ USING STACK,0 ADDRESS STACK AREA 01-AMAIN
00000 E0FB 0018 0018 51+ IAL 0,STACKEND-STACK SET STACK SIZE 01-AMAIN
00002 B624 0000 0009 0000 52+ NIST 9(0),0 CLEAR ON ERROR INFO (LCL DATA PTR) 01-AMAIN
54 *COMPUTES ARC-COSINE(X) OF SINGLE PRECISION SCALAR 00002300
55 INPUT F0 SCALAR SP 00002400
0000000 56+F0 EQU 0 01-INPUT
58 OUTPUT F0 SCALAR SP RADIANS 00002500
.
.
.
To anybody who is familiar with assembly language, this certainly
looks like an assembly listing produced by an assembler, so
why do I say that it's not? The first clue is the line
numbering: There's a line 52 and a line 54, but no line
53. And there are lines 56 and 58, but no line 57.
Admittedly, it's not 100% certain why that is, but having tried to
track it down, it appears to me that both of those gaps correspond
to uses of the SPACE
pseudo-op appearing in expansions of the AMAIN
and INPUT
macros
respectively. According
to the assembly-language manual, "The SPACE instruction is
used to insert one or more blank lines in the listing."
Which is clearly not what has happened, and indeed is the
opposite of what has happened.
Another clue, not apparent from the fragment above, is in the
number of lines per page of the printout. Originally, an
assembly listing would have been output to a line printer having
(nominally) ~55 lines per page. Whereas the file in RUNLST
have about 80 lines per page. Ergo, RUNLST does not provide
an unchanged copy of the original listing. Nor are there any
embedded form-feed characters or other means to advance to the top
of the next page before a page heading is printed.
Still, the files of RUNLST are the best guide available as to the
format of assembly listings, and hence ASM101S mimics that
format to the extent feasible (i.e., to the extent not too
pathetically obsessive), plus the addition of form-feed characters
to signal page breaks.
When I refer later on to "existing assembly listings", keep in
mind that I'm referring to these files from RUNLST and not to
actual original assembly listings.
The AP-101 character set
does not match that of the System/360 assembler. The latter
is the EBCDIC character set, or rather the variation of EBCDIC
listed in Appendix A, but with only a subset of those used outside
of quoted strings or comments.
As far as I can tell, the AP-101 assembly-language character set
is not defined. There is some confusion regarding how
character data is character data is supposed to be encoded in
object files.
Examining character strings appearing in object files output by
the HAL/S compiler HAL/S-FC, which one would suppose
should be consistent with AP-101S assembly language, you find that
text is encoded per the Space Shuttle's Display Electronics Unit
(DEU).
The DEU character set is depicted in the table to the
right. It is an ASCII-like character set, in the sense that
almost wherever the printable characters or control codes overlap
with printable ASCII characters or control codes, the numerical
encoding matches.
Aside: The overlaps with ASCII are:
NULLThe mismatches with ASCII (i.e., the characters with differing numerical codes) are:
BACKSPACE
CARRIAGE RETURN
SPACE
Digits
Alphabetic letters
! ~ # % & ' ( ) * + , - . / : ; < = > ? |
_ [ ] ~ "
Aside: One character, the ASCII back-tick (`), also appears in the Space Shuttle flight-software assembly-language source code available to us (well, to me anyway), in spite of being absent from the DEU and EBCDIC character sets. Though only the comments of in two lines of code. This was apparently done just to spite me! (I jest. It appears to have been done to mark a section of code so that it could be easily found later.)But wait! In AP-101S assembly language, constant character data (and constant data of any other type as well) is stored in memory via the pseudo-op called
DC
.
For example, to store the character data "PC" into memory at the
current assembly location, you might employ the assembly languageWhile we have assembly listings for only a limited selection of AP-101S source code, we do have some. Here's how a line such as the one above assembles:DC C'PC'
001C9 D7C3 964+ DC C'PC' 01-GENER FCMCBLKSThe part shown in red are what is supposedly stored in memory for "PC", those are the EBCDIC codes for the characters "P" and "C". Thus ultimately, it remains unclear as to whether it is the DEU encoding or the EBCDIC encoding which appears in the object files produced.
Aside: In case you assume it's impossible that EBCDIC encoding would be shown on the assembly listing but that DEU encoding would occur in the object files, note that the assembly listings produced by IBM Federal Services Division's LVDC assembler, at least in some versions, had somewhat analogous bugs for constant character data. In that case, constant character data was stored into memory at assembly time by a pseudo-op calledBeyond these characters, any Space Shuttle flight-software source code available from Virtual AGC will have been "anonymized" by replacing personal names or initials with randomized identifiers beginning with either the ASCII carat (^) or backslash (\) characters, and thus either of these characters may appear in such source code even though not in either the DEU or EBCDIC character sets.BCI
(rather thanDC
), and even though the object code created by the assembler must have had the correct character data, the character data as displayed on the assembly listings themselves was garbage, seemingly due to some mismatch between the various character sets involved. In creating the "modern" LVDC assembler, I found it difficult to decide whether to reproduce the original bug (thus continuing to print garbage in the assembly listings) or else to fix the bug (to make the assembly listings look right but to differ from the surviving printouts of assembly listings). Ultimately, I added a command-line switch (--past-bugs
) to allow the user to make the choice for themselves as to whether to override the original bug. But the LVDC assembler is unrelated to the AP-101S assembler, and there's no reason at all to suppose that a bug in one would somehow carry over into the other. Nevertheless, it's a good illustration of the danger associated with assuming that just because these tools were used regularly, for a long time, that they were necessarily free of bugs, particularly insofar as a non-critical item like the assembly listing is concerned.
In IBM 360 Basic Assembly Language (BAL), various aliases exist
for the branch instructions BCR
and BC
. These are described
in Figure
4-1 of the assembler-language manual. While it is
tempting to say that Figure 4-1 should be accepted as-is for
AP-101S assembly language, that's unfortunately impossible:
Conditional-branch instructions encode a "mask" to be applied to
the CPU's condition codes, but the mask is 4 bits wide for
System/360 and only 3 bits wide for AP-101S.
Something has to give! But Figure 4-1 does serve as a
starting point for reverse-engineering AP-101 aliases for
conditional-branch instructions. Here's my own list of
AP-101S mnemonics for aliased branch instructions, grouped by
condition-code mask.
NOP
,NOPR
— No Operation.BH
,BO
,BP
—
Branch on High, Branch on Overflow, Branch on PlusBL
,BM
,BN
— Branch on Low, Branch on Minus, Branch on NegativeBNE
,BNZ
— Branch on Not Equal,
Branch on Not ZeroBE,BZ
— Branch on
Equal, Branch on ZeroBNL,
BNM
— Branch on Not Low, Branch on Not MinusBNH
,BNP
,BLE
,BNO
— Branch on Not High,
Branch on Not Plus, Branch on Less-or-Equal, Branch on Not
OverflowB
,BR
— Unconditional BranchNote: While the mnemonics and condition masks in the list above are accurate (I hope!), but textual descriptions are less certain and should be taken with a grain of salt. Corrections are welcome!
Aside: The original Figure 4-1 explained how each alias corresponded specifically toBCR
or (much-more commonly)BC
instructions, but I've skipped that explanation here. (Look at the file model101.py of the ASM101S source code if you're interested.) Partly that's because it's unlikely to be of interest, but also because it's far more complicated for AP-101S than for System/360: For example, mnemonic alias like
BNP
might be coded as aBC
instruction under some circumstances, as aBCB
instruction in other circumstances, or as aBCF
instruction under still other circumstances.
LHI
: Besides the
branch-instruction aliases, Shuttle flight-software code uses the
operator LHI
, but
without any AP-101 instruction or any macro definition
corresponding to it. There is such an instruction in IBM 360
assembly language. The
AP-101S POO notes in its discussion of the LA
instruction that there is
a particular configuration of operands for which LA
will be "functionally
equivalent to a LOAD HALFWORD IMMEDIATE instruction". My
guess is that the original assembler therefore accepted the
mnemonic LHI
but
silently transformed it in the appropriate LA
instruction. ASM101S treats it in that manner as
well.
SHI
: Similarly,
flight software uses the non-existent SHI
instruction. The program comment at those points clearly
indicate that this is a kind of subtract-immediate instruction,
presumably Subtract Halfword Immediate. Unlike the case of LHI
, there is no
corresponding SHI
instruction for System 360. Nevertheless, we might suppose
that the case is still similar, in that this could be an alias for
(perhaps) a particular configuration of operands for some other
AP-101 instruction. Fortunately, we have plenty of examples
of assembly listings for code using SHI
.
Consider this example:
B0E5 FFFE SHI R5,2
The value 0xFFFE is a halfword with the value -2, which leaves us
to suspect that this is actually an addition. There is
indeed an Add
Halfword Immediate instruction (AHI
),
and "AHI R5,-2
" would
indeed assemble as shown.
LACR
: There is no
corresponding System/360 instruction to guide our thinking.
However, there are lots of examples in AP-101 assembly listings,
such as those for the CTOI.txt file of the HAL/S-FC runtime
library. LACR
is
seen to be a register-to-register operation. For (say)
general-registers N and M, it assembles to the bit
pattern 11101nnn 11101mmm. This is the same
pattern that the LOAD ARITHMETIC COMPLEMENT (LCR
)
instruction assembles to. Therefore, LACR
is nothing more than a synonym for LCR
.
PC
: Similarly,
this undocumented instruction is found from available assembly
listings to assembly as a synonym for MVH
(move halfword). There's no rationale obvious to me for the
specific mnemonic "PC" for this operation.
Not all pseudo-ops described in the System/360 assembler manual
appear in surviving AP-101 assembly-language source code.
I've chosen to believe that rather than the omissions being
coincidental, those pseudo-ops are instead specific to System/360
and thus had been entirely omitted from AP-101
assembly-language. Admittedly, that inference is probably
wrong in the case of certain of the pseudo-ops.
Nevertheless, they have not been implemented in ASM101S.
The omitted pseudo-ops are:
Obviously, this list is subject to change, if legacy AP-101 assembly-language source code using any of these pseudo-ops is discovered.
The SPOFF
and SPON
pseudo-ops — if they are
pseudo-ops — seem typically to be used in pairs: SPOFF
is used to disable
something unknown, then an instruction or two later, SPON
is used to re-enable
whatever it was that SPOFF
disabled. They are not pseudo-ops in IBM 360 assembly
language, and hence must be specific to AP-101S.
Fortunately, we have a few contemporary assembly listings in
which these pseudo-ops appear in the source code, and thus their
effect can be observed somewhat. They do not generate any
binary, hence they are definitely not instructions of any
kind. Furthermore, they do not affect whether or not the
source code they enclose is assembled, nor whether that source
code appears in the assembly listing.
I would tentatively conclude that at least for the moment they
can simply be ignored, and that's what ASM101S does with
them for now.
COPY
'd FilesThe
System/360 assembler manual tells us that assembly-language
files included in other assembly-language files via the COPY
pseudo-op cannot
contain various other pseudo-ops, two of which are MACRO
and MEND
. That implies
that a COPY
'd file
cannot contain any macro definitions. Nevertheless, Space
Shuttle flight software has file inclusions that violate this
restriction. Specifically, the files MLIB80/MACSMITH.asm and
MLIB80/MACROS.asm do contain macro definitions, and yet are
themselves COPY
'd into
other assembly-language files. Consequently, this
restriction (at least insofar as MACRO
and MEND
are concerned)
does not apply in AP-101 assembly-language.
The
assembler manual tells us that
"The macro instruction prototype statement (hereafter called the prototype statement) specifies the mnemonic operation code and the format of all macro instructions that refer to the macro definition. It must be the second statement of every macro definition."For example, in a macro definition such as
no other statements must appear between the first two lines shown here.MACRO
MYMACRO &ARG1,&ARG2
.
.
.
MEND
I guess we'd infer from this, and very reasonably, that comments are not "statements", but more importantly, that the macro prototype is not necessarily the second line in a macro definition.MACRO
.* THIS IS A COMMENT
.* THIS IS ANOTHER COMMENT
.
.
.
.* THERE WERE A WHOLE LOT OF COMMENTS, SEE?
MYMACRO &ARG1,&ARG2
.
.
.
MEND
Aside: I don't know if anybody will read these words, ever, but my sixth sense tells me that some folks who do might be smugly saying to themselves right now that "of course full-line comments are not 'statements' in any language, so what's this fool on about?" As it happens, on p. 69 of the assembler manual, we find a section actually entitled "Comments Statements", which proceeds to define the term comments statement as being precisely the thing we're discussing right now.
Aside: AP-101 CPU instructions fall into 5 categories, depending on the pattern of operands they accept. These 5 categories are designated RR, RS, SRS, SI, and RI. The differences between these relate to the number of operands and the means of addressing them, but the specifics aren't important for our discussion here.
All AP-101 CPU instructions of type RS can optionally have
suffixes "@", "#", or "@#" added to their mnemonics. For
example, just as there is an SCAL
instruction of type RS, there are also SCAL@
,
SCAL#
, and SCAL@#
instructions of type
RS.
To be picky about it, this usage is indeed documented, but it
took me so long to figure out that I thought I should take
explicit notice of it here anyway.
And in case it's not obvious to you what the POO means by "indirect addressing" and/or "autoindexing", there is much greater detail in the POO's explanation of RS type instructions."... [@] [#] indicates that the use of indirect addressing and/or autoindexing is optional. For example, [instruction mnemonic]
M
specifies direct addressing without autoindexing, whileM#
specifies direct addressing with autoindexing."
LB
is used in the
same manner as the instruction LA
(load address), but it is a mystery what it signifies. I
presently have no examples of assembly listings containing it
from which I can deduce anything. One possibility is that
it's simply an error never detected during Shuttle
software development because the only known usage is in the
definition of a macro (LD
)
which in fact is never invoked by any of the other source
code. In other words, perhaps ASM101S shouldn't
have been getting uppity by trying to parse that macro in the
first place!Aside: "BNF", of course, stands for Backus-Naur form. Technically, the grammars are actually written in the modified EBNF (Extended Backus-Naur form) supported by the TatSu parser module for the Python language. See the Python source-code file fieldParser.py for the grammars themselves.Nevertheless, even having adding this level of complexity to the parser, it's not necessarily the case that the syntax parsed by ASM101S matches that parsed by the original assembler. For example, arithmetic expressions as specified by the System/360 assembly-language manual are constrained in various ways — e.g., cannot begin with '+' or '-', cannot have have more than 16 terms, cannot have more than 5 levels of parentheses —, but have not been endowed with the same constraints in ASM101S. On the other hand, I haven't necessarily bothered to implement theoretically-possible syntax that isn't present in actual flight software. Consequently, it's likely that ASM101S accepts a more-complex syntax in some contexts than did the original assembler, and vice-versa. Or course, ASM101S can be upgraded as needed to support such missing syntax, if it turns out to be desirable, whereas the original assembler cannot.
The AP-101 CPU has 8 general registers, typically referred to
symbolically in assembly language as R0
through R7
, as well as
8 floating-point registers, typically referred to as F0
through F7
. This is the same
situation as in System/360 assembly language, except that in
System/360 there are more of each kind of register. For
example, an assembly-language instruction that performs an integer
addition from register R7
to register R3
would
look like this in either of the two assembly languages:
But there's a catch. The assembly-language manual explains thatAR R3,R7
"All symbols that specify register numbers ... must be assumed to be equated elsewhere to absolute values."In other words, the register-name symbols
R3
and R7
in this example
are not tokens or syntactical elements of the assembly
language, and the pure syntax for the instruction example shown
above should actually be this:The only reason that the former instruction would be accepted by the assembler, the manual is explaining, is that the full example should have read something like this:AR 3,7
In turn, this means that in the macro libraries loaded by the assembler, we should should find variousR3 EQU 3
R7 EQU 7
.
.
.
AR R3,R7
EQU
ates
similar to the ones above, for the general registers and
floating-point registers. And indeed, for the macro libraries
used for the Space Shuttle primary flight software (PASS), and
backup flight software (BFS), we find exactly such declarations in
the PASS module MLIB80/MACSMITH or the BFS module MLIB80/EQU, along
with numerous other EQU
ates
of a similar nature:Unfortunately, that's not the full story. Besides the flight software as such, AP-101 assembly-language files also exists in the runtime library provided by HAL/S-FC, the HAL/S compiler. Those assembly-language files reference the CPU general registers and floating-point registers just as any of the flight-software files do, except that there are no.
.
.
F0 EQU 0 FP 0 = FLOATING POINT REGISTER
F1 EQU 1 1
F2 EQU 2 2
F3 EQU 3 3
F4 EQU 4 4
F5 EQU 5 5
F6 EQU 6 6
F7 EQU 7 7
G0 EQU 0 SET 1 GR 0 = GENERAL REGISTER
G1 EQU 1 1
G2 EQU 2 2
G3 EQU 3 3
G4 EQU 4 4
G5 EQU 5 5
G6 EQU 6 6
G7 EQU 7 7
R0 EQU 0 SET 2 GR 0 = GENERAL REGISTER
R1 EQU 1 1
R2 EQU 2 2
R3 EQU 3 3
R4 EQU 4 4
R5 EQU 5 5
R6 EQU 6 6
R7 EQU 7 7
.
.
.
EQU
ates for those
registers in any of those source-code files, nor in the macro
library used by those files.EQU
ates are missing is that
our HAL/S-FC runtime-library source code is
incomplete. Unfortunately, there is no way to know whether
that is correct or not. Another possibility is that the
System/360 assembly-language manual is incorrect, and that the
assembler does by default recognize the general registers Rn
and float-point
registers F
n
,
and possibly other symbols, without explicit EQU
ates.EQU
ates,
if such are encountered.T'
&A
)
with the notation T'
returns an assembly-time string consisting of a single character
that corresponds to the type of data the variable contains.
For example, if &A
were a character-string variable as declared via the GBLC
or LCLC
pseudo-op, then the assembler's preprocessor would replace T'&A
by the single
character C
at
assembly-time.It isn't entirely clear to me what # indicates. My current
very tentative interpretation is this:
D'
AttributeAP-101S assembly-language source code uses an attribute operator
D'
, which is not defined
in the assembly-language manual. From the way it is used, I
infer when applied to an identifier, it returns "true" (1) if the
identifier has been previously defined within the source-code
being assembled and "false" (0) if not. A typical usage
would be something like
AIF (D'MYSYM).OKAY
EXTRN MYSYM
OKAY ...
Thus if the identifier is not defined, it allows the code to
detect that condition and to mark the identifier as being declared
externally.
AIF
and AGO
AIF
and AGO
pseudo-ops provide "goto"
functionality (respectively conditionally or unconditionally) at
assembly time (rather than at runtime). The
System/360 assembly-language manual makes it clear that these
"goto" operations can operate only with the same macro depth, and
further, if within a macro, only within the same
macro. For example, in the "pseudo-instruction" the locations of the pseudo-instruction itself and of the sequence symbolAGO .MYSEQ
.MYSEQ
could be
both outside of any macro, or they could be within the same macro
definition. But it could not be the case (say) that
the pseudo-instruction was within a macro definition and the
sequence symbol was within a macro invoked by that macro.COPY
pseudo-op. Is it possible for the AGO
or AIF
pseudo-instruction
to be in a file containing a COPY
pseudo-op while the target sequence symbol is in the file being COPY
'd? Or vice-versa?COPY
, any AGO
/AIF
pseudo-instruction and its target sequence symbol must reside within
the same COPY
'd file.Certain arithmetical quirks are inherent in System/360 assembly
language, and I must presume that these peculiarities carry over
into AP-101S assembly language as well. Therefore, ASM101S
retains these peculiarities rather than eliminating them.
The peculiarities I regard as worth noting are these:
&A
the value -5
via a
line of pseudo-code such as "&A
SETA -5
", because "-5
"
is neither a legal literal nor a legal arithmetic
expression. (Nor would +5
be legal.) The
assembly-language manual seems to advise using workarounds
such as "&A SETA 0-5
"./
" operator is integer
division. For example, 5/2 evaluates to 2. The
manual does not explain what the result of an operation like
(0-5)/2 would be; either -3 or -2 is plausible. Until the
correct behavior is somehow determined, ASM101S uses the
Python convention (i.e., it uses the Python //
operator), which would
result in -3 in this case.+
, -
,
*
, or /
. ("Terms" is
quoted here to distinguish the System/360 usage from the normal
mathematical usage, in which terms are added or
subtracted from to/from each other, while factors are
multiplied or divided to/by each other.) These operations
are performed in left-to-right order, except that
multiplications or divisions are performed prior to additions or
subtractions. In particular, in an expression like 3*5/2,
division does not have a higher precedence than multiplication,
so it is evaluated as (3*5)/2 rather than as 3*(5/2).On the other hand, ASM101S does remove some of the
constraints of System/360 assembly-language arithmetical
restrictions, namely:
Aside: Regarding peculiarities of my own making, as opposed to those of the language itself or the original assembler, I'm obliged to admit that I don't quite understand how to perfectly handle assembly-time evaluation of arithmetic expressions involving program labels: i.e., involving the addresses of symbols rather than the values of constants.
To do so, ASM101S instead uses an imperfect trick, making use of the facts that the address space of the AP-101S is limited to 24 bits and that the number of allowed control sections in a program (at least in System/360) is limited to 255. The addresses of program labels (prior to linking) is precisely an ordered pair of the form (control section, offset into control section), but performing arithmetical computations is easiest when these values can somehow be converted to single numbers rather than ordered pairs. The trick is to assign each control section a unique but randomized 64-bit value whose least-significant 24 bits are all 0, and to convert addresses of symbols to a sums of these 64-bit values plus 24-bit offsets into the control sections. (I don't mean that the codes for the symbols are actually random, but rather that they are selected in a way that makes it unlikely to produce their values by common types of calculations.) In this way, calculations like
SYMBOL+OFFSET
orSYMBOL1-SYMBOL2
(for symbols in the same section) produce the expected results, and indeed, produces correct results for all correct expressions. Unfortunately it remains possible to combine symbols in an incorrect manner from two different control sections and get a result that appears to be in yet a third control section, which is incorrect. This potential is part of the reason for using 64-bit pseudo-addresses (and distributing the unique numerical codes for the control sections throughout a 40-bit space) rather than 32-bit pseudo-addresses (and distributing the unique numerical codes in an 8-bit space): It reduces to a very low level the probability of producing "fake" control sections in calculations.
According to the System/360 assembly-language manual, although
EXTRN
symbols can appear in expressions, they cannot be paired. This implies, I think, that they can be handled interoperably with the description in the preceding paragraph, by using unique but randomized 64-bit values with the lower 24 bits all 0 in place of those symbols.
I thought at first that the same trick could be used to handle calculations other not-yet-defined symbols. Unfortunately, such an attempt would be guaranteed to produce incorrect results in calculations like
KNOWN-UNKNOWN
, even ifKNOWN
andUNKNOWN
both turned out to be members of the same control section. Therefore, the addresses of all symbols in the current file must be ascertained in a separate pass before computations of expressions involving such symbols are performed.
Among the types of expressions computed by the assembler at
assembly-time for use with pseudo-ops such as SETB
or AIF
are the boolean
expressions, of which one sub-type is relational expressions
involving string values.
A relational expression is used to determine that two values
(either two numbers or two strings) are equal (EQ
), not-equal (NE
), less-than (LT
), less-than-or-equal (LE
), greater-than (GT
), or
greater-than-or-equal-to (GE
)
each other. For example, the relational expression
returns the value "true" (which in System/360 assembly language is numerically equivalent to 1) since 3 is less than 4.3 LT 4
returns "true".'Z' LT 'AA'
Thus we really don't know what collation sequence is
appropriate. Until such time as this question can be
resolved in a more-authoritative manner, ASM101S assumes
that the collation sequence is ASCII.
Character expressions consist of text delimited by single-quotes,
as for example 'HELLO'
,
plus various additional flourishes that you can read about in the
System/360 assembly-language manual but which I won't bother to
rehash here.
This means that the substring to be extracted begins at'HELLO'(start,length)
index
start and is length
characters in
width.Before describing the specific AP-101S versus System/360 issue
associated with the items known as "SET symbols", let me summarize
some of what the System/360 assembly-language manual has to say
about them.
In System/360 assembly language there is the concept of symbols
known only to the assembler, in contradiction to symbols
representing addresses in the runtime memory of the assembled
program. These symbols are distinguished in that their names
are prefixed by the character '&'. Thus MYVAR
might be a variable
representing a memory location, whose contents can be modified by
the assembly-language program when it is run, while &MYVAR
might represent
an assembly-time variable, assigned a value that can be
manipulated during the assembly process, but that is not known or
modifiable by the assembled program.
Here, we're concerned only by the latter category, namely the SET
symbols.
SET symbols can be categorized another way, namely by their
datatypes, which cannot be changed once established. The
three types are:
Yet a third way that they can be characterized is as:
GBLA
,
GBLB
, GBLC
, LCLA
,
LCLB
, or LCLC
. Any of these
instructions also assigns an initial value the symbol, either 0,
False (0), or '' (empty string), depending on the datatype.
For example, the instruction "LCLB
&BOO
" declares a local boolean SET symbol
called &BOO
and
assigns it the default value False (numerically, 0).SETA
,
SETB
, or SETC
.SETA
,
SETB
, or SETC
(or used in other
manners) without any declaration via GBLA
,
GBLB
, GBLC
, LCLA
,
LCLB
, or LCLC
whatsoever (prior or
otherwise), which is a possibility denied by the System/360
assembly-language manual.What are we to make of this?MACRO 00000100
INPUT &X 00000200
GBLA &ENTCNT 00000300
GBLB &INPUT(20),&LIB 00000400
AIF (N'&SYSLIST EQ 0).EMPTY 00000500
&INPUT(&ENTCNT) SETB 1 00000600
AIF ('&X' EQ 'NONE').SPACE 00000700
&I SETA 1 00000800
&LAST SETA N'&SYSLIST 00000900
.LOOP AIF (K'&SYSLIST(&I) NE 2).BADREG 00001000
&R SETC '&SYSLIST(&I)' 00001100
AIF ('&R'(1,1) NE 'F' AND '&R'(1,1) NE 'R').BADREG 00001200
AIF ('&R' EQ 'R0').BADREG 00001300
AIF (&LIB AND ('&R' EQ 'R1' OR '&R' EQ 'R3')).INVREG1 00001400
AIF (NOT &LIB AND '&R' EQ 'R4').INVREG2 00001500
AIF (D'&R).NEXT 00001600
&N SETC '&R'(2,1) 00001700
&R EQU &N 00001800
.NEXT ANOP 00001900
&I SETA &I+1 00002000
AIF (&I LE &LAST).LOOP 00002100
.SPACE SPACE 00002110
MEXIT 00002200
.BADREG MNOTE 4,' ILLEGAL REGISTER SPECIFICATION - &SYSLIST(&I)' 00002300
AGO .NEXT 00002400
.INVREG1 MNOTE 4,'&R INVALID INPUT FOR PROCEDURE ROUTINE' 00002500
AGO .NEXT 00002600
.INVREG2 MNOTE 4,'R4 INVALID INPUT FOR INTRINSIC' 00002700
AGO .NEXT 00002800
.EMPTY MNOTE 4,'OPERAND REQUIRED' 00002900
MEND 00003000
When a variable that has not previously been explicitly declared (byGBLx
orLCLx
) is the target of a
SETx
instruction, it is declared automatically by the assembler as if via
LCLx
.
Aside: If this inference is correct, it might seem naively that there's no need for the instructionsLCLA
,LCLB
, orLCLC
at all, since aSETA
,SETB
, orSETC
could always be used instead. Upon closer inspection that's not true, sinceLCLx
(likeGBLx
) can additionally be used to declare SET symbols as arrays, which aSETx
instruction with this convenience feature could not. And even in the non-arrayed case, there are certainly instances in existing code in whichLCLx
is indeed used explicitly even though the described convenience feature would not require it. For example, consider this macro from the AP-101S runtime-library source code, which unlike the problematic macro listed above corresponds exactly to the System/360 assembly-language manual's pronouncements:
As for the origin of such a convenience feature in the first place, I'd note that in addition to being "convenient", the complexity of some AP-101S macros could make some of those macros very difficult or impossible to implement otherwise. According to System/360 rules, allMACRO 00000100
&NAME AERROR &NUM,&GROUP=4 00000200
GBLA &ERRCNT,&ERRNUMS(10),&ERRGRPS(10) 00000300
LCLA &I 00000400
AIF (&NUM GT 62).BADNUM 00000500
&I SETA &ERRCNT 00000600
.DUPLOOP AIF (&I LE 0).NEWERR 00000700
AIF (&NUM EQ &ERRNUMS(&I) AND &GROUP EQ &ERRGRPS(&I)).DUP 00000800
&I SETA &I-1 00000900
AGO .DUPLOOP 00001000
.NEWERR ANOP 00001100
&ERRCNT SETA &ERRCNT+1 00001200
&I SETA &ERRCNT 00001300
&ERRNUMS(&I) SETA &NUM 00001400
&ERRGRPS(&I) SETA &GROUP 00001500
.DUP ANOP 00001600
*********ISSUE SEND ERROR SVC****************************************** 00001700
&NAME SVC AERROR&I ISSUE SEND ERROR SVC 00001800
*********SEND ERROR SVC RETURNS CONTROL FOR STANDARD FIXUP************* 00001900
MEXIT 00002000
.BADNUM MNOTE 12,'ERROR NUMBER GREATER THAN 62' 00002100
MEND 00002200GBLx
andLCLx
instructions must appear not merely beforeSETx
instructions involving the SET symbols they declare, but indeed prior to everything else. For example,GBLx
instructions must appear immediately after the prototype line of a macro definition, with nothing intervening except comments, whileLCLx
instructions in turn must appear immediately after that. Thus if a macro definition depends on the flexibility of allowing a SET symbol to be declared in alternate ways under different circumstances, such as arrayed vs non-arrayed or integer vs character, the rules of the System/360 assembler likely would not allow it because alternate declarations could appear in the prescribed location. Whereas the rules of implicit declaration viaSETx
instructions basically allow non-arrayed local declarations to appear anywhere. So the convenience feature of implicit declaration, if it truly exists, could have arisen from necessity rather than from a desire for mere convenience. Not that "mere" convenience is to be sneered at. But that's just speculation on my part, with the answer lost in the mists of time past.
DC
and DS
Pseudo-Op FormatsThe
System/360 assembly-language manual describes a
quite-complex format for the operands of the DS
and DC
pseudo-ops used for
allocating or initializing data memory. (The description
takes about 11 pages, which is over 6% of the manual.)
However, I see no point in implementing those features of this
format which are not actually used in Space Shuttle
flight-software source code. At present, I believe that the
following features of the DC
/DS
format do not
need to be supported in ASM101S:
AP-101S instructions are of 5 basic types, designated (by IBM) as
RR, RS, SRS, RI, and SI, based on the syntax patterns of their
operands and on the way they are encoded as machine
instructions. Some of these are System/360 patterns, and
some are not. I won't bore you with the details, as you can
read about them in the
AP-101S Principles of Operation. However, there is a
certain difficulty with SRS- and RS-type instructions that could in
principle cause a mismatch between object code generated by
ASM101S vs the original AP-101S assembler, though hopefully
not any behavioral difference at runtime other than slight timing
discrepancies. This group of instructions includes, among
other things, all conditional-branch instructions and
their aliases.
The difficulty relates to the fact that certain instruction
mnemonics are used both for SRS-type instructions and
RS-type instructions. Moreover, while some of the operand
patterns for them are accepted for SRS instructions and not RS
instructions, thus allowing the assembler to distinguish between
them, some of the operand patterns overlap. In those
cases, there is no syntactic way for the assembler to distinguish
between the SRS instruction and the RS instruction. Overlap
occurs for the following syntactical patterns (where R1
, D2
,
and B2
refer to the
names of fields in the encoded machine instruction):
OPCODE R1,D2
OPCODE R1,D2(B2)
The vulnerable opcode mnemonics are:
While the SRS-type and RS-type instructions are behaviorally identical, they are encoded differently as machine instructions, and in particular require different amounts of memory to do so. SRS-type instructions are encodes as half-words (2 bytes), while RS-type instructions are encoded as full words (4 bytes). For example, there is no syntactical way to know whether to encode the instruction "A AE AH BC C CH D DE IAL L LA LE LH M ME MH N O S SE SH SHW ST STH TD TH X ZH
L 4,SWITCH
"
as 2 bytes or as 4 bytes. So if ASM101S were to encode
an instruction as SRS while the original assembler were to encode it
as RS, or vice-versa, then not only would the binary forms of those
particular instructions differ, but all of the code following that
instruction in the same control section would be aligned
differently.As I said, there is no syntactic way for the assembler to
distinguish between these cases, but there is a non-syntactic way
based on the size of the D2
sub-operand. If D2
is in the numerical range 0-55, then the SRS instruction can be
used, while if D2
is 56
or greater, the RS instruction must be used. But as far as I
can see — and perhaps this is a limitation of my own imagination!
— the size of D2
can be
determined only in case it involves only previously-defined
symbols whose addresses or values are known. If forward
reference is involved, I see no way to determine D2
's value with
certainty. And there is nothing documented, as far as I
know, that limits the determination of D2
just to previously-defined symbols.
With that said, in some cases ASM101S can know
with certainty that the SRS form can be used. For example,
in the instruction
OPCODE R1,SYMBOL
where SYMBOL
happens to be a program label above but within 56 words of a base
address specified by the
USING
pseudo-op,
then the SRS encoding is certainly applicable.
In short, while ASM101S uses heuristic methods to try to
make the same choices of SRS vs RS as the original assembler, but
given that the rules used by the original assembler are not known,
there is no guarantee that it succeeds in doing so in all possible
cases.
My Inference |
HLASM Manual |
---|---|
The D' operator |
Defined
Attribute (D') |
Declaration of SET symbols |
I've not found this so far in the HLASM
manual, but Ehrman's presentation says the following:
SET symbols can be implicitly declared "as local variables,
if first appearance is as the name-field symbol of a SETx
assignment statement − this is the only implicit form whose
values may be changed (SET)". (See figure Cond-10.) The HLASM manual, on the other hand, contains the following curious provision that seems to partially contradict Ehrman: "If the variable symbol is the same as the character value, the assembler considers the variable symbol to be an implicitly defined local SETC symbol which is
given a null character string value. For example: &C6 SETC '&C6' ,
assigns the value ''
to &C6 .
Later
on, the manual makes a related assertion: "If
the variable symbol is the same as the character value, the
assembler considers the variable symbol to be an implicitly
defined local SETA
symbol, which is given a value of zero. For example: &ASYM2 SETA &ASYM2 .
&ASYM2 has a value 0 ." |