7.2. String Escape Characters
\r\n
- is used on windows\n
- is used everywhere else

Figure 7.14. Why we have '\r\n' on Windows?
Sequence |
Description |
---|---|
|
New line (LF - Linefeed) |
|
Carriage Return (CR) |
|
Horizontal Tab (TAB) |
|
Single quote |
|
Double quote |
|
Backslash |
Sequence |
Description |
---|---|
|
Bell (BEL) |
|
Backspace (BS) |
|
New page (FF - Form Feed) |
|
Vertical Tab (VT) |
|
Character with 16-bit (2 bytes) hex value |
|
Character with 32-bit (4 bytes) hex value |
|
ASCII character with octal value |
|
ASCII character with hex value |
print('\U0001F680') # 🚀
7.2.1. Escape characters
Escape characters
\t
- tab\r
- carriage return\n
- newline\r\n
- newline (on Windows)\b
- backspace\v
- vertical space\f
- form feed\x
- hexadecimal\o
- octal\u
- Unicode entity 16-bit\U
- Unicode entity 32-bit\\
- backslash\'
- apostrophe\"
- double quote
>>> import string
>>>
>>>
>>> string.whitespace
' \t\n\r\x0b\x0c'
>>> print('Hello\nWorld')
Hello
World
Linefeed means to advance downward to the next line; however, it has been repurposed and renamed. Used as "newline", it terminates lines (commonly confused with separating lines). This is commonly escaped as n, abbreviated LF or NL, and has ASCII value 10 or 0x0A. CRLF (but not CRNL) is used for the pair rn [#stackFF]_.
>>> print('Hello\r\nWorld')
Hello
World
Carriage return means to return to the beginning of the current line without advancing downward. The name comes from a printer's carriage, as monitors were rare when the name was coined. This is commonly escaped as r, abbreviated CR, and has ASCII value 13 or 0x0D [#stackFF]_.
>>> print('Hello\rWorld')
World
The most common difference (and probably the only one worth worrying about) is lines end with CRLF on Windows, NL on Unix-likes, and CR on older Macs (the situation has changed with OS X to be like Unix). Note the shift in meaning from LF to NL, for the exact same character, gives the differences between Windows and Unix. (Windows is, of course, newer than Unix, so it didn't adopt this semantic shift. That probably came from the Apple II using CR. CR was common on other 8-bit systems, too, like the Commodore and Tandy. ASCII wasn't universal on these systems: Commodore used PETSCII, which had LF at 0x8d (!). Atari had no LF character at all. For whatever reason, CR = 0x0d was more-or-less standard. Many text editors can read files in any of these three formats and convert between them, but not all utilities can [#stackFF]_.
>>> print('Hello\bWorld')
HellWorld
b is a nondestructive backspace. It moves the cursor backward, but doesn't erase what's there. Then following output overwrites the previous.
>>> print('Hello\sWorld')
Hello\sWorld
>>> print('hello\tWorld')
Hello World
Form feed means advance downward to the next "page". It was commonly used as page separators, but now is also used as section separators. (It's uncommonly used in source code to divide logically independent functions or groups of functions.) Text editors can use this character when you "insert a page break". This is commonly escaped as f, abbreviated FF, and has ASCII value 12 or 0x0C [#stackFF]_.
>>> print('Hello\fWorld')
Hello World
Form feed is a bit more interesting (even though less commonly used directly), and with the usual definition of page separator, it can only come between lines (e.g. after the newline sequence of NL, CRLF, or CR) or at the start or end of the file [#stackFF]_.
Vertical tab was used to speed up printer vertical movement. Some printers used special tab belts with various tab spots. This helped align content on forms. VT to header space, fill in header, VT to body area, fill in lines, VT to form footer. Generally it was coded in the program as a character constant. From the keyboard, it would be CTRL-K. It is hardly used any more. Most forms are generated in a printer control language like postscript [#stackVT1]_.
>>> print('Hello\vWorld')
Hello
World
The above output appears to result in the default vertical size being one line. This could be used to do line feed without a carriage return on devices with convert linefeed to carriage-return + linefeed [#stackVT1]_.
Microsoft Word uses VT as a line separator in order to distinguish it from the normal new line function, which is used as a paragraph separator [1].
7.2.2. Case Study
Windows absolute path problem
Absolute path include all entries in the directories hierarchy
Absolute path on
*nix
starts with root/
dirAbsolute path on Windows starts with drive letter
Linux (and other *nix):
>>> file = '/home/myuser/newfile.txt'
macOS:
>>> file = '/Users/myuser/newfile.txt'
Windows:
>>> file = 'c:/Users/myuser/newfile.txt'
Problem with paths on Windows
Use backslash (
\\
) as a path separatorUse r-string for paths
Let's say we have a path to a file:
>>> print('C:/Users/myuser/newfile.txt')
C:/Users/myuser/newfile.txt
Paths on Windows do not use slashes (/
). You must use backslash (\\
)
as a path separator. This is where all problems starts. Let's start changing
slashes to backslashes from the end (the one before newfile.txt
):
>>> print('C:/Users/myuser\newfile.txt')
C:/Users/myuser
ewfile.txt
This is because \n
is a newline character. In order this to work
we need to escape it.
Now lets convert another slash to backslash, this time the one before
directory named myuser
:
>>> print('C:/Users\myuser\\newfile.txt')
SyntaxWarning: invalid escape sequence '\m'
C:/Users\myuser\newfile.txt
Since Python 3.12 all non-existing escape characters (in this case \m
will need to be escaped or put inside of a row strings. This is only a
warning (SyntaxWarning: invalid escape sequence '\m'
, so we can ignore
it, but this behavior will be default sometime in the future, so it is better
to avoid it now.
The last slash (the one before Users
):
>>> print('C:\Users\\myuser\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
This time the problem is more serious. Problem is with \Users
. After
escape sequence \U
Python expects hexadecimal Unicode codepoint, i.e.
\U0001F600
which is a smiley 😀 emoticon emoticon. In this example,
Python finds letter s
, which is invalid hexadecimal character and
therefore raises an SyntaxError
telling user that there is an error
with decoding bytes. The only valid hexadecimal numbers are
0123456789abcdefABCDEF
and letter s
isn't one of them.
There is two ways how you can avoid this problem. Using escape before every slash:
>>> print('C:\\Users\\myuser\\newfile.txt')
C:\Users\myuser\newfile.txt
Or use r-string:
>>> print(r'C:\Users\myuser\newfile.txt')
C:\Users\myuser\newfile.txt
Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths.