A tokenised program file on a disk device has the following format.
FF
| Bytes | Format | Meaning |
|---|---|---|
| 2 | Unsigned 16-bit little-endian integer. | Memory location of the line following the current one. This is used internally by GW-BASIC but ignored when a program is loaded. |
| 2 | Unsigned 16-bit little-endian integer. | The line number. |
| Variable | Tokenised BASIC, see below. | The contents of the line. |
| 1 |
00 (NUL byte)
|
End of line marker. |
1A is written to mark the end of file. This is optional;
the file will be read without problems if it is omitted.
The printable ASCII characters in the range 20—7E
are used for string literals, comments, variable names,
and elements of statement syntax that are not reserved words.
Reserved words are represented by their reserved word tokens and
numeric literals are represented by numeric token sequences.
Numeric literals are stored in tokenised programs
according to the following representation. All numbers are positive; negative numbers are stored
simply by preceding the number with EA, the token for -.
| Class | Bytes | Format |
|---|---|---|
| Indirect line numbers | 3 |
0E followed by an unsigned 16-bit little-endian integer.
|
| Octal integers | 3 |
0B followed by an unsigned 16-bit little-endian integer.
|
| Hexadecimal integers | 3 |
0C followed by an unsigned 16-bit little-endian integer.
|
| Positive decimal integers less than 11 | 1 |
Tokens 11—1B represent 0—10.
|
| Positive decimal integers less than 256 | 2 |
0F followed by an unsigned 8-bit integer.
|
| Other decimal integers | 3 |
1C followed by a two's complement signed 16-bit little-endian integer.
GW-BASIC will recognise a negative number encountered this way but it will not store
negative numbers itself using the two's complement, but rather by preceding the positive
number with EA.
|
| Single precision floating-point number | 5 |
1D followed by a four-byte single in Microsoft Binary Format.
|
| Double precision floating-point number | 9 |
1F followed by an eight-byte double in Microsoft Binary Format.
|
Most keywords in PC-BASIC are reserved words. Reserved words are represented in a tokenised program by a single- or double-byte token. The complete list is below.
All function names and operators are reserved words and all statements start with a reserved word
(which in the case of LET is optional). However, the converse is not true:
not all reserved words are statements, functions, or operators.
For example, TO and SPC( only occur as part of a statement syntax.
Furthermore, some keywords that form part of statement syntax are not reserved words:
examples are AS, BASE, and ACCESS.
Keywords that are not reserved words are spelt out in full text in the tokenised source.
A variable or user-defined function name must not be identical to a reserved word. The list below is an exhaustive list of reserved words that can be used to determine whether a name is legal.
81 END82 FOR83 NEXT84 DATA85 INPUT86 DIM87 READ88 LET89 GOTO8A RUN8B IF8C RESTORE8D GOSUB8E RETURN8F REM90 STOP91 PRINT92 CLEAR93 LIST94 NEW95 ON96 WAIT97 DEF98 POKE99 CONT9C OUT9D LPRINT9E LLISTA0 WIDTHA1 ELSEA2 TRONA3 TROFFA4 SWAPA5 ERASEA6 EDITA7 ERRORA8 RESUMEA9 DELETEAA AUTOAB RENUMAC DEFSTRAD DEFINTAE DEFSNGAF DEFDBLB0 LINEB1 WHILEB2 WENDB3 CALLB7 WRITEB8 OPTIONB9 RANDOMIZEBA OPENBB CLOSEBC LOADBD MERGEBE SAVEBF COLORC0 CLSC1 MOTORC2 BSAVEC3 BLOADC4 SOUNDC5 BEEPC6 PSETC7 PRESETC8 SCREENC9 KEYCA LOCATECC TOCD THENCE TAB(CF STEPD0 USRD1 FND2 SPC(D3 NOTD4 ERLD5 ERRD6 STRING$D7 USINGD8 INSTRD9 'DA VARPTRDB CSRLINDC POINTDD OFFDE INKEY$E6 >E7 =E8 <E9 +EA -EB *EC /ED ^EE ANDEF ORF0 XORF1 EQVF2 IMPF3 MODF4 \FD81 CVIFD82 CVSFD83 CVDFD84 MKI$FD85 MKS$FD86 MKD$FD8B EXTERRFE81 FILESFE82 FIELDFE83 SYSTEMFE84 NAMEFE85 LSETFE86 RSETFE87 KILLFE88 PUTFE89 GETFE8A RESETFE8B COMMONFE8C CHAINFE8D DATE$FE8E TIME$FE8F PAINTFE90 COMFE91 CIRCLEFE92 DRAWFE93 PLAYFE94 TIMERFE95 ERDEVFE96 IOCTLFE97 CHDIRFE98 MKDIRFE99 RMDIRFE9A SHELLFE9B ENVIRONFE9C VIEWFE9D WINDOWFE9E PMAPFE9F PALETTEFEA0 LCOPYFEA1 CALLSFEA5 PCOPYFEA7 LOCKFEA8 UNLOCKFF81 LEFT$FF82 RIGHT$FF83 MID$FF84 SGNFF85 INTFF86 ABSFF87 SQRFF88 RNDFF89 SINFF8A LOGFF8B EXPFF8C COSFF8D TANFF8E ATNFF8F FREFF90 INPFF91 POSFF92 LENFF93 STR$FF94 VALFF95 ASCFF96 CHR$FF97 PEEKFF98 SPACE$FF99 OCT$FF9B LPOSFF9A HEX$FF9C CINTFF9D CSNGFF9E CDBLFF9F FIXFFA0 PENFFA1 STICKFFA2 STRIGFFA3 EOFFFA4 LOCFFA5 LOF
The following additional reserved words are activated by the option
syntax={pcjr|tandy}.
FEA4 NOISEFEA6 TERM
The tokens 10, 1E and 0D are
known to be used internally by GW-BASIC. They should not appear in a
correctly stored tokenised program file.
Floating point numbers in GW-BASIC and PC-BASIC are represented in Microsoft Binary Format (MBF), which differs from the IEEE 754 standard used by practically all modern software and hardware. Consequently, binary files generated by either BASIC are fully compatible with each other and with some applications contemporary to GW-BASIC, but not easily interchanged with other software. QBASIC, for example, uses IEEE floats.
MBF differs from IEEE in the position of the sign bit and in using only 8 bits for the exponent, both in single- and in double-precision. This makes the range of allowable numbers in an MBF double-precision number smaller, but their precision higher, than for an IEEE double: an MBF single has 23 bits of precision, while an MBF double has 55 bits of precision. Both have the same range.
Unlike IEEE, the Microsoft Binary Format does not support signed zeroes, subnormal numbers, infinities or not-a-number values.
MBF floating point numbers are represented in bytes as follows:
Here, E0 is the exponent byte and the other bytes form the mantissa, in little-endian order so that M1 is the most significant byte. The most significant bit of M1 is the sign bit, followed by the most significant bits of the mantissa: M1 = s0 f1 f2 f3 f4 f5 f6 f7. The other bytes contain the less-significant mantissa bits: M2 = f8 f9 fA fB fC fD fE fF, and so on.
The value of the floating-point number is v = 0 if E0 = 0 and v = (-1) s0 × mantissa × 2 E0 - 128 otherwise, where the mantissa is formed as a binary fraction mantissa = 0 . 1 f1 f2 f3 ...
The protected format is an encrypted form of the tokenised format. GW-BASIC would refuse to show the source code of such files. This protection scheme could easily be circumvented by changing a flag in memory. Deprotection programs have circulated widely for decades and the decryption algorithm and keys were published in a mathematical magazine.
A protected program file on a disk device has the following format.
FE
0B 0A 09 08 07 06 05 04 03 02 01
1E 1D C4 77 26 97 E0 74 59 88 7C
A9 84 8D CD 75 83 43 63 24 83 19 F7 9A
0D 0C 0B 0A 09 08 07 06 05 04 03 02 01
1A is written to mark the end of file. This is optional;
the file will be read without problems if it is omitted. Since the
end-of-file marker of the tokenised program is included in the encrypted
content, a protected file is usually one byte longer than its
unprotected equivalent.
BSAVE file formatA memory-dump file on a disk device has the following format.
FD
| Bytes | Format | Meaning |
|---|---|---|
| 2 | Unsigned 16-bit little-endian integer. | Segment of the memory block. |
| 2 | Unsigned 16-bit little-endian integer. | Offset of the first byte of the memory block. |
| 2 | Unsigned 16-bit little-endian integer. | Length of the memory block in bytes. |
1A is written to mark the end of file. This is optional;
the file will be read without problems if it is omitted.
Files on cassette are stored as frequency-modulated sound. The payload format of files on cassette is the same as for files on disk device, but the headers are different and the files may be split in chunks.
A 1-bit is represented by a single 1 ms wave period (1000 Hz). A 0-bit is represented by a single 0.5 ms wave period (2000 Hz).
A byte is sent as 8 bits, most significant first. There are no start- or stopbits.
A file is made up of two or more records. Each record has the following format:
| Length | Format | Meaning |
|---|---|---|
| 256 bytes |
All FF
|
2048 ms pilot wave at 1000 Hz, used for calibration. |
| 1 bit |
0
|
Synchronisation bit. |
| 1 byte |
16 (SYN)
|
Synchronisation byte. |
| 256 bytes | Data block. | |
| 2 bytes | Unsigned 16-bit big-endian integer | CRC-16-CCITT checksum. |
| 31 bits | 30 1s followed by a 0. | End of record marker. |
Tokenised, protected and BSAVE files consist of a header record
followed by a single record which may contain multiple 256-byte data blocks, each followed by the 2 CRC bytes.
Plain text program files and data files consist of a header record followed by multiple single-block records.
| Bytes | Format | Meaning |
|---|---|---|
| 1 |
A5
|
Header record magic byte |
| 8 | 8 characters | Filename. |
| 1 |
File type. 00 for data file,
01 for memory dump,
20 or A0 for protected,
40 for plain text program,
80 for tokenised program.
|
|
| 2 | Unsigned 16-bit little-endian integer | Length of next data record, in bytes. |
| 2 | Unsigned 16-bit little-endian integer | Segment of memory location. |
| 2 | Unsigned 16-bit little-endian integer | Offset of memory location. |
| 1 |
00
|
End of header data |
| 239 |
All 01
|
Filler |
| Bytes | Format | Meaning |
|---|---|---|
| 1 | 8-bit unsigned integer | Number of payload bytes in last record, plus one. If zero, the next record is not the last record. |
| 255 | Payload data. If this is the last record, any unused bytes are filled by repeating the last payload byte. |
PC-BASIC uses a number of file formats to support its emulation of legacy hardware, which are documented in this section. These file formats are not used by GW-BASIC or contemporary software.
The HEX file format for bitfonts was developed by Roman Czyborra for the GNU Unifont package. PC-BASIC uses this file format to store its fonts.
A HEX file is an ASCII text file, consisting of lines terminated by LF.
Each line of this file is one of the following:
# character.
PC-BASIC also encodes 8- and 14-pixel high fonts in this manner; these are encoded as 16-bit high fonts with the remaining rows set to zero.
Unicode-codepage mappings are stored in UCP files.
A UCP file is an ASCII text file, consisting of lines terminated by LF.
Each line of this file is one of the following:
# character.
A CAS file is a bit-level representation of cassette data introduced by the PCE emulator.
CAS-files produced by PC-BASIC start with the characters
PC-BASIC tapeEOF. This sequence is followed by seven 0 bits,
followed by the tape contents. The seven zero bits are intended to
ensure that the tape contents are byte-aligned; the one bit is made up
by the synchronisation bit following the pilot wave.
Note that PC-BASIC does not require the introductory sequence to read a CAS-file correctly, nor does it require the contents of a CAS-file to be byte-aligned. However, new files produced by PC-BASIC follow this convention.
Depending on context, PC-BASIC will treat a code point in
the control characters range as a control character or as a
glyph defined by the active codepage which by
default is codepage 437. Code points of
&h80 or higher are always interpreted as a
codepage glyph.
This is a list of the American Standard Code for Information Interchange (ASCII).
ASCII only covers 128 characters and defines the code point ranges
&h00–&h1F and &h7F
as control characters which do not have a printable glyph assigned
to them. This includes such values as the Carriage Return (CR)
character that ends a program line.
In the context of this documentation, character &h1A (SUB)
will usually be indicated as EOF since it plays the role of end-of-file marker in DOS.
_0 |
_1 |
_2 |
_3 |
_4 |
_5 |
_6 |
_7 |
_8 |
_9 |
_A |
_B |
_C |
_D |
_E |
_F |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ |
NUL |
SOH |
STX |
ETX |
EOT |
ENQ |
ACK |
BEL |
BS |
HT |
LF |
VT |
FF |
CR |
SO |
SI |
1_ |
DLE |
DC1 |
DC2 |
DC3 |
DC4 |
NAK |
SYN |
ETB |
CAN |
EM |
SUB |
ESC |
FS |
GS |
RS |
US |
2_ |
|
! |
" |
# |
$ |
% |
& |
' |
( |
) |
* |
+ |
, |
- |
. |
/ |
3_ |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
: |
; |
< |
= |
> |
? |
4_ |
@ |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
5_ |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
[ |
\ |
] |
^ |
_ |
6_ |
` |
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
7_ |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
{ |
| |
} |
~ |
DEL |
This table shows the characters that are produced by the 256 single-byte code points when the DOS Latin USA codepage 437 is loaded, which is the default. Other codepages can be loaded to assign other characters to these code points.
&h00 cannot be redefined.
&h20–&h7E
will result in a different glyph being shown on the screen, but the
character will continue to be treated as the corresponding ASCII character.
It will retain its ASCII value when transcoded into UTF-8. This happens,
for example, with the Yen sign (¥) which is
assigned to ASCII code point &h5C in code page 932:
in that codepage it is treated as if it were a backslash (\).
_0 |
_1 |
_2 |
_3 |
_4 |
_5 |
_6 |
_7 |
_8 |
_9 |
_A |
_B |
_C |
_D |
_E |
_F |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ |
|
☺ |
☻ |
♥ |
♦ |
♣ |
♠ |
• |
◘ |
○ |
◙ |
♂ |
♀ |
♪ |
♫ |
☼ |
1_ |
► |
◄ |
↕ |
‼ |
¶ |
§ |
▬ |
↨ |
↑ |
↓ |
→ |
← |
∟ |
↔ |
▲ |
▼ |
2_ |
|
! |
" |
# |
$ |
% |
& |
' |
( |
) |
* |
+ |
, |
- |
. |
/ |
3_ |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
: |
; |
< |
= |
> |
? |
4_ |
@ |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
5_ |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
[ |
\ |
] |
^ |
_ |
6_ |
` |
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
7_ |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
{ |
| |
} |
~ |
⌂ |
8_ |
Ç |
ü |
é |
â |
ä |
à |
å |
ç |
ê |
ë |
è |
ï |
î |
ì |
Ä |
Å |
9_ |
É |
æ |
Æ |
ô |
ö |
ò |
û |
ù |
ÿ |
Ö |
Ü |
¢ |
£ |
¥ |
₧ |
ƒ |
A_ |
á |
í |
ó |
ú |
ñ |
Ñ |
ª |
º |
¿ |
⌐ |
¬ |
½ |
¼ |
¡ |
« |
» |
B_ |
░ |
▒ |
▓ |
│ |
┤ |
╡ |
╢ |
╖ |
╕ |
╣ |
║ |
╗ |
╝ |
╜ |
╛ |
┐ |
C_ |
└ |
┴ |
┬ |
├ |
─ |
┼ |
╞ |
╟ |
╚ |
╔ |
╩ |
╦ |
╠ |
═ |
╬ |
╧ |
D_ |
╨ |
╤ |
╥ |
╙ |
╘ |
╒ |
╓ |
╫ |
╪ |
┘ |
┌ |
█ |
▄ |
▌ |
▐ |
▀ |
E_ |
α |
ß |
Γ |
π |
Σ |
σ |
µ |
τ |
Φ |
Θ |
Ω |
δ |
∞ |
φ |
ε |
∩ |
F_ |
≡ |
± |
≥ |
≤ |
⌠ |
⌡ |
÷ |
≈ |
° |
∙ |
· |
√ |
ⁿ |
² |
■ |
|
PC-BASIC uses PC/XT scancodes, which originated on the 83-key IBM Model F keyboard
supplied with the IBM PC 5150. The layout of this keyboard was quite distinct
from modern standard keyboards with 101 or more keys, but keys on a modern keyboard produce
the same scancode as the key with the same function on the Model F. For example,
the key that (on a US keyboard) produces the \ was located next to
the left Shift key on the Model F keyboard and has scancode
&h2B. The (US) backslash key still has this scancode, even
though it is now usually found above the Enter key.
To further complicate matters, keyboards for different locales have their layout remapped in software rather than in hardware, which means that they produce the same scancode as the key that on a US keyboard is in the same location, regardless of which character they actually produce.
Therefore, the A on a French keyboard will produce the same scancode as the Q on a UK or US keyboard. The aforementioned US \ key is identified with the key that is generally found to the bottom left of Enter on non-US keyboards. For example, on my UK keyboard this is the # key. Non-US keyboards have an additional key next to the left Shift which on the UK keyboard is the \. Therefore, while this key is in the same location and has the same function as the Model F \, it has a different scancode.
In the table below, the keys are marked by their function on a US keyboard, but it should be kept in mind that the scancode is linked to the position, not the function, of the key.
| Key | Scancode |
|---|---|
| Esc | 01 |
| 1 ! | 02 |
| 2 @ | 03 |
| 3 # | 04 |
| 4 $ | 05 |
| 5 % | 06 |
| 6 ^ | 07 |
| 7 & | 08 |
| 8 * | 09 |
| 9 ( | 0A |
| 0 ) | 0B |
| - _ | 0C |
| = + | 0D |
| Backspace | 0E |
| Tab | 0F |
| q Q | 10 |
| w W | 11 |
| e E | 12 |
| r R | 13 |
| t T | 14 |
| y Y | 15 |
| u U | 16 |
| i I | 17 |
| o O | 18 |
| p P | 19 |
| [ { | 1A |
| ] } | 1B |
| Enter | 1C |
| Ctrl | 1D |
| a A | 1E |
| s S | 1F |
| d D | 20 |
| f F | 21 |
| g G | 22 |
| h H | 23 |
| j J | 24 |
| k K | 25 |
| l L | 26 |
| ; : | 27 |
| ' " | 28 |
| ` ~ | 29 |
| Left Shift | 2A |
| \ | | 2B |
| z Z | 2C |
| x X | 2D |
| c C | 2E |
| v V | 2F |
| b B | 30 |
| n N | 31 |
| m M | 32 |
| , < | 33 |
| . > | 34 |
| / ? | 35 |
| Right Shift | 36 |
| keypad * PrtSc | 37 |
| Alt | 38 |
| Space | 39 |
| Caps Lock | 3A |
| F1 | 3B |
| F2 | 3C |
| F3 | 3D |
| F4 | 3E |
| F5 | 3F |
| F6 | 40 |
| F7 | 41 |
| F8 | 42 |
| F9 | 43 |
| F10 | 44 |
| Num Lock | 45 |
| Scroll Lock Pause | 46 |
| keypad 7 Home | 47 |
| keypad 8 ↑ | 48 |
| keypad 9 Pg Up | 49 |
| keypad - | 4A |
| keypad 4 ← | 4B |
| keypad 5 | 4C |
| keypad 6 → | 4D |
| keypad + | 4E |
| keypad 1 End | 4F |
| keypad 2 ↓ | 50 |
| keypad 3 Pg Dn | 51 |
| keypad 0 Ins | 52 |
| keypad . Del | 53 |
| SysReq | 54 |
| \ | (Non-US 102-key) | 56 |
| F11 | 57 |
| F12 | 58 |
| Left Logo (Windows 104-key) | 5B |
| Right Logo (Windows 104-key) | 5C |
| Menu (Windows 104-key) | 5D |
| ひらがな/カタカナ Hiragana/Katakana (Japanese 106-key) | 70 |
| \ _ (Japanese 106-key) | 73 |
| 変換 Henkan (Japanese 106-key) | 79 |
| 無変換 Muhenkan (Japanese 106-key) | 7B |
| 半角/全角 Hankaku/Zenkaku (Japanese 106-key) | 29 |
| ¥ | (Japanese 106-key) | 7D |
| 한자 Hanja (Korean 103-key) | F1 |
| 한/영 Han/Yeong (Korean 103-key) | F2 |
| \ ? ° (Brazilian ABNT2) | 73 |
| keypad . (Brazilian ABNT2) | 7E |
Alongside scancodes, most keys also carry
a character value the GW-BASIC documentation calls extended ASCII.
Since this is a rather overloaded term, we shall use the abbreviation
e-ASCII exclusively for these values.
The values returned by the
INKEY$ function are e-ASCII values.
e-ASCII codes are one or
two bytes long; single-byte codes are simply ASCII codes whereas
double-byte codes consist of a NUL character plus
a code indicating the key pressed. Some, but certainly not all,
of these codes agree with the keys' scancodes.
Unlike scancodes, e-ASCII codes of unmodified keys and those of keys modified by Shift, Ctrl or Alt are all different.
Unmodified, Shifted and Ctrled e-ASCII codes are connected to a key's meaning, not its location. For example, the e-ASCII for Ctrl+a are the same on a French and a US keyboard. By contrast, the Alted codes are connected to the key's location, like scancodes. The US keyboard layout is used in the table below.
| Key | e-ASCII | e-ASCII Shift | e-ASCII Ctrl | e-ASCII Alt |
|---|---|---|---|---|
| Esc | 1B |
1B |
1B |
|
| 1 ! | 31 |
21 |
00 78 |
|
| 2 @ | 32 |
40 |
00 03 |
00 79 |
| 3 # | 33 |
23 |
00 7A |
|
| 4 $ | 34 |
24 |
00 7B |
|
| 5 % | 35 |
25 |
00 7C |
|
| 6 ^ | 36 |
5E |
1E |
00 7D |
| 7 & | 37 |
26 |
00 7E |
|
| 8 * | 38 |
2A |
00 7F |
|
| 9 ( | 39 |
28 |
00 80 |
|
| 0 ) | 30 |
29 |
00 81 |
|
| - _ | 2D |
5F |
1F |
00 82 |
| = + | 3D |
2B |
00 83 |
|
| Backspace | 08 |
08 |
7F |
00 8C |
| Tab | 09 |
00 0F |
00 8D |
00 8E |
| q Q | 71 |
51 |
11 |
00 10 |
| w W | 77 |
57 |
17 |
00 11 |
| e E | 65 |
45 |
05 |
00 12 |
| r R | 72 |
52 |
12 |
00 13 |
| t T | 74 |
54 |
14 |
00 14 |
| y Y | 79 |
59 |
19 |
00 15 |
| u U | 75 |
55 |
15 |
00 16 |
| i I | 69 |
49 |
09 |
00 17 |
| o O | 6F |
4F |
0F |
00 18 |
| p P | 70 |
50 |
10 |
00 19 |
| [ { | 5B |
7B |
1B |
|
| ] } | 5D |
7D |
1D |
|
| Enter | 0D |
0D |
0A |
00 8F |
| a A | 61 |
41 |
01 |
00 1E |
| s S | 73 |
53 |
13 |
00 1F |
| d D | 64 |
44 |
04 |
00 20 |
| f F | 66 |
46 |
06 |
00 21 |
| g G | 67 |
47 |
07 |
00 22 |
| h H | 68 |
48 |
08 |
00 23 |
| j J | 6A |
4A |
0A |
00 24 |
| k K | 6B |
4B |
0B |
00 25 |
| l L | 6C |
4C |
0C |
00 26 |
| ; : | 3B |
3A |
||
| ' " | 27 |
22 |
||
| ` ~ | 60 |
7E |
||
| \ | | 5C |
7C |
1C |
|
| z Z | 7A |
5A |
1A |
00 2C |
| x X | 78 |
58 |
18 |
00 2d |
| c C | 63 |
43 |
03 |
00 2E |
| v V | 76 |
56 |
16 |
00 2F |
| b B | 62 |
42 |
02 |
00 30 |
| n N | 6E |
4E |
0E |
00 31 |
| m M | 6D |
4D |
0D |
00 32 |
| , < | 2C |
3C |
||
| . > | 2E |
3E |
||
| / ? | 2F |
3F |
||
| PrtSc | 00 72 |
00 46 |
||
| Space | 20 |
20 |
20 |
00 20 |
| F1 | 00 3B |
00 54 |
00 5E |
00 68 |
| F2 | 00 3C |
00 55 |
00 5F |
00 69 |
| F3 | 00 3D |
00 56 |
00 60 |
00 6A |
| F4 | 00 3E |
00 57 |
00 61 |
00 6C |
| F5 | 00 3F |
00 58 |
00 62 |
00 6D |
| F6 | 00 40 |
00 59 |
00 63 |
00 6E |
| F7 | 00 41 |
00 5A |
00 64 |
00 6F |
| F8 | 00 42 |
00 5B |
00 65 |
00 70 |
| F9 | 00 43 |
00 5C |
00 66 |
00 71 |
| F10 | 00 44 |
00 5D |
00 67 |
00 72 |
| F11 (Tandy) | 00 98 |
00 A2 |
00 AC |
00 B6 |
| F12 (Tandy) | 00 99 |
00 A3 |
00 AD |
00 B7 |
| Home | 00 47 |
00 47 |
00 77 |
|
| End | 00 4F |
00 4F |
00 75 |
|
| PgUp | 00 49 |
00 49 |
00 84 |
|
| PgDn | 00 51 |
00 51 |
00 76 |
|
| ↑ | 00 48 |
00 48 |
||
| ← | 00 4B |
00 87 |
00 73 |
|
| → | 00 4D |
00 88 |
00 74 |
|
| ↓ | 00 50 |
00 50 |
||
| keypad 5 | 35 |
35 |
05 |
|
| Ins | 00 52 |
00 52 |
||
| Del | 00 53 |
00 53 |
PC-BASIC (rather imperfectly) emulates the memory of real-mode MS-DOS.
This means that memory can be addressed in segments of 64 KiB.
Each memory address is given by the segment value and the 0--65535 byte offset with respect to that segment.
Note that segments overlap: the actual memory address is found by segment*16 + offset.
The maximum memory size that can be addressed by this scheme is thus 1 MiB, which was the size of the conventional and upper
memory in real-mode MS-DOS.
Areas of memory with a special importance are:
| Segment | Name | Purpose |
|---|---|---|
&h0000
|
Low memory | Holds machine information, among other things |
&h13AD (may vary)
|
Data segment | Program code, variables, arrays, strings |
&hA000 (EGA) &hB000 (MDA) &hB800 (CGA) |
Video segment | Text and graphics on visible and virtual screens |
&hC000
|
-- | RAM font definition, among other things |
&hF000
|
Read-only memory | ROM font definition, among other things |
The data segment is organised as follows. The addresses may vary depending on the settings of various options; given here are the default values for GW-BASIC 3.23.
| Offset | Size (bytes) | Function |
|---|---|---|
&h0000
|
3429
|
Interpreter workarea. Unused in PC-BASIC; can be adjusted with
the --reserved-memory option.
|
&h0D65
|
(max-files+1) * 322
|
File blocks: one for the program plus one for each file allowed by --max-files |
&h126D
|
3 + c
|
Program code. An empty program uses 3 bytes. |
&h1270 + c
|
v
|
Scalar variables. |
&h1270 + c + v
|
a
|
Array variables. |
&hFDFC - s
|
a
|
String variables, filled downward from &hFDFC
|
&hFDFC
|
512
|
BASIC stack, size set by CLEAR statement.
|
&hFFFE
|
Top of data segment, set by CLEAR statement.
|