The GSM character set

One of the issues that we continually see appearing is the question of which characters can be sent in a text message. What follows is a brief description of the format of a text message

The UK mobile networks all use the GSM standard, and as such, a standard text message is limited to the GSM character set.

An SMS can contain up to 140 bytes. The GSM character set is encoded using 7-bits, rather than the usual 8-bits that make a byte. This means there can be 160 characters in an SMS.

This 7-bit limitation means only 128 standard characters can be encoded. The GSM standard gets round this by also having the Extended GSM character set. These are another 10 characters which are actually sent by sending two 7-bit characters, an escape (ESC) character followed by another character. This means that 160 ‘£’ symbols can fit in a single SMS, but only 80 ‘{‘ symbols.

If you need to send other characters than those in the GSM character set, then take a look at UCS-2 messaging which allows sending most unicode characters, but as each of these takes 2 bytes it means only 70 characters can be sent per SMS.

So our customers don’t need to bother converting characters into the GSM encoding or escaping the extended characters, our SMS API accepts messages in UTF-8 encoding.

The tables below show a full list of the GSM characters, standard and extended. They also show the equivalent UTF-8 encoding need to send into our API.

Standard GSM Characters

GSM UTF-8 Char
00 40 @
01 C2,A3 £
02 24 $
03 C2,A5 ¥
04 C3,A8 è
05 C3,A9 é
06 C3,B9 ù
07 C3,AC ì
08 C3,B2 ò
09 C3,87 Ç
0A 0A <LF>
0B C3,98 Ø
0C C3,B8 ø
0D 0D <CR>
0E C3,85 Å
0F C3,A5 å
10 CE,94
11 5F _
12 CE,A6 Φ
13 CE,93 Γ
14 CE,9B Λ
15 CE,A9 Ω
16 CE,A0 Π
17 CE,A8 Ψ
18 CE,A3 Σ
19 CE,98 Θ
1A CE,9E Ξ
1B 1B <ESC>
1C C3,86 Æ
1D C3,A6 æ
1E C3,9F ß
1F C3,89 É
GSM UTF-8 Char
20 20 <SP>
21 21 !
22 22
23 23 #
24 C2,A4 ¤
25 25 %
26 26 &
27 27
28 28 (
29 29 )
2A 2A *
2B 2B +
2C 2C ,
2D 2D -
2E 2E .
2F 2F /
30 30 0
31 31 1
32 32 2
33 33 3
34 34 4
35 35 5
36 36 6
37 37 7
38 38 8
39 39 9
3A 3A :
3B 3B ;
3C 3C <
3D 3D =
3E 3E >
3F 3F ?
GSM UTF-8 Char
40 C2,A1 ¡
41 41 A
42 42 B
43 43 C
44 44 D
45 45 E
46 46 F
47 47 G
48 48 H
49 49 I
4A 4A J
4B 4B K
4C 4C L
4D 4D M
4E 4E N
4F 4F O
50 50 P
51 51 Q
52 52 R
53 53 S
54 54 T
55 55 U
56 56 V
57 57 W
58 58 X
59 59 Y
5A 5A Z
5B C3,84 Ä
5C C3,96 Ö
5D C3,91 Ñ
5E C3,9C Ü
5F C2,A7 §
GSM UTF-8 Char
60 C2,BF ¿
61 61 a
62 62 b
63 63 c
64 64 d
65 65 e
66 66 f
67 67 g
68 68 h
69 69 i
6A 6A j
6B 6B k
6C 6C l
6D 6D m
6E 6E n
6F 6F o
70 70 p
71 71 q
72 72 r
73 73 s
74 74 t
75 75 u
76 76 v
77 77 w
78 78 x
79 79 y
7A 7A z
7B C3,A4 ä
7C C3,B6 ö
7D C3,B1 ñ
7E C3,BC ü
7F C3,A0 à

Extended GSM Characters

In an SMS these are prefixed with the escape character (1B) and therefore take up 2 of the 160 characters of an SMS. They do not need escaping when sending into our API.

GSM UTF-8 Char
10 0C <FF>
14 5E ^
28 7B {
29 7D }
2F 5C \
3C 5B [
3D 7E ~
3E 5D ]
40 7C |
65 E2,82,AC

4 Comments

  1. Mark says:

    Very interesting. I’ve noticed that some refer to it as the gsm 03.38 character set. Were there earlier versions, and how long has gsm 03.38 been the standard?

  2. Martin says:

    Mark,

    The 03.38 refereed to on some sites is a specification number assigned by 3GPP who are the body that oversee international mobile standards. In this case it’s the 38th document of their Technical realization series. The current version of the specification is 7.2.0, released in 1999. The earliest version still available on their site is 4.0.1, released in 1994, the major change in the character set between these versions is the addition of the extended GSM characters {}[]^|€~\.

  3. ssam says:

    to send € – do we send 1B65 is it how its relayed to the carriers and how is it displayed on the phone?

    part I am not clear is – if 65 maps to € then why are we escaping it??

  4. John says:

    To send to our API you would use UTF-8. So the bytes for this are E2,82,AC.

    The GSM codes are the codes used to transfer the message to the phone.

    65 on it’s own is the letter ‘e’. To send a €, the networks use an escape character of 1B followed by the letter ‘e’ (1B, 65). This is why the € symbol takes the space of two GSM characters in a message.

    Whilst it isn’t necessary to know the exact GSM codes to send into our API, the information in the blog article lets you know which characters can be sent to a phone in a standard 160 GSM SMS, and which take 2 characters spaces.

Add your comment