Week 3. Topic "Files and directories"

3 Files and directories

3.1 What is a file

Like any other operating system Unix stores information in chunks called files due to their similarity to ordinary office files. Each file has a name, content and some administrative information like file size, the physical place in the storage medium, various timestamps, the owner, etc. sometimes called the meta information, because they are about the content, not the content itself.

The file system in Unix is organized in such a way that you can maintain your own personal files without interfering with files belonging to others. In Unix jargon it is called your home. Every user has a "home" of his own. (Remember the HOME environment variable from the assignment last week?)

Preparation: To follow the tutorial you need two files. Please copy them to your home directory. (That is where you are after logging in, with the prompt regno@sece-codesys:~$.) The commands for copying is cp /home/ratna/knave knave. and cp /home/ratna/queen queen. We will be manipulating those files. Don’t be afraid you might break something. You can always get a fresh copy from the teacher's home.

The ls (list) command lists the names of files:

ls 
knave  queen

So the files are named knave and queen. Notice that the list is sorted alphabetically.

It is important to note that you can not "open" a file without knowing the kind of information in it and how exactly they are coded. For example, if you somehow open a HTML file with a photo viewer, you won’t see anything useful. People use filename extensions like txt, c, html as a convenience, but there is no mechanism to guarantee that the content matches the extension! Only the file (file type) command will tell you the kind of information in a file. Let’s try it:

file knave
knave: ASCII text
file queen
queen: ASCII text

So although they don't have file extensions both files are simple text files.

To get the time of last modification run ls with the option -l (long). If the time resolution is not good enough use the ls --full-time option:

ls −l knave 
−rw−r−−r−− 1 user group 91 Dez 30 22:15 knave
ls −−full−time knave
−rw−r−−r−− 1 user group 111 2014−12−30 22:15:47.787031906 +0100 knave

You can update the time stamp of a file by “touching” it:

ls −l knave 
−rw−r−−r−− 1 user group  91 Jan 30 22:15 knave
touch knave
ls −l knave
−rw−r−−r−− 1 user group  91 Feb 11 22:15 knave

Hint: The touch command can be abused to create empty files, because touch creates an empty file if the given file does not exist.

touch king 
ls −l king
−rw−r−−r−− 1 user group  0 Feb 11 22:30 king || | | | | | | | |
12 3 4 5 6 7 8 9 10

In fact ls -l brings a whole lot of meta information. In this chapter we are going to discuss only a few of them. For the sake of completeness you will find the full list in the next table:

Column nr. Description
1 File type
2 File access permissions
3 Number of links
4 File owner
5 Group owner
6 File size (in bytes)
7, 8 and 9 Month, day and time of last modification to the file
10 Name of file

The final meta information we are going to look at is where a file is kept in the file system. When the file system was created it organizes the storage space in blocks and gives an index to each block, which is called inode. The inode of the starting block of a file is all what the file system needs to find a file. The ls -i lists the inodes of files:

ls −i 
3156545 knave  3156550 queen

3.2 What’s in a file name

So far we have used filenames without saying what a legal name is. Firstly, in its initial design Unix filenames were limited to 14 characters. In today’s Unix systems a file name could be up to 254 characters long, which should be ample, and should be utilized. Secondly although you can use almost any character in a filename, common sense says you should avoid non-printable (invisible) characters and characters that have other meanings. We have already seen that the hyphen and double hyphens are used by Unix commands to denote their options. So if you had a file whose name was -t, you would have a tough time listing it with ls.

Besides the hyphen as a first character, there are other characters with special meaning. To avoid pitfalls, you would do well to use only the latin characters a-z and A-Z, the digits 0-9, the period ' .’, the underscore ’_’ and the hyphen ’-’ except that the hyphen should not be used as the first character. The period, the underscore and the hyphen are conveniently used to divide filenames into chunks, as in assignment1_praja_sunil.txt or draft-2015-02-13.odt. Finally don’t forget that the case distinctions matter - Assignment1_Praja_Sunil.txt is not the same as assignment1_praja_sunil.txt! The command line user tends to prefer lowercase - for easy typing.

4.1.3 Copying, moving and deleting files

The cp (copy) command duplicates files. Its syntax is cp source target. Example:

cp knave knave2 
ls
knave  knave2  queen

Keep in mind that if the target exists, it will simply be overwritten. If you want to be informed when that happens, use the -i (interactive) option.

To move files use the mv (move) command:

mv knave knave2 
ls
knave2  queen

The rm (remove) command deletes files:

ls 
knave knave2  queen
rm knave2
ls
knave queen

Note that the file will be silently removed. If you want add confirmation to it, use the -i option:

ls 
knave  knave2  queen
cp −i knave knave2
cp: overwrite ‘knave2’? y

3.4 Content of a file

To look at the content of a file byte-by-byte you take a "dump". The original Unix program for this is od (octal dump). It was common those days to work in the octal (base 8) system!

od knave 
0000000 062163 005146 066040 066154 000012
0000011
od −h knave
0000000 6473 0a66 6c20 6c6c 000a
0000011
od −c knave
0000000   s   d   f  \n       l   l   l  \n
0000011

The example above demonstrates that the default behaviour of od could be changed by using options: -h (hexadecimal), -c (ASCII character).

There is another command which is often abused to display the content of text files. It is the cat (concatenate, to join) command, which was originally meant to join two or more files in to one. But you can just make it print the content of files by giving their names as arguments:

cat queen 
The Queen of Hearts,
she made some tarts,

How do you print the full poem, in queen and knave, out of them?

3.2 Handling directories

3.2.1 Absolute paths

When you are logged in to a shell, you are always “in” some directory called current directory or working directory. The pwd (print working directory) command tells you where you are. Immediately after log in, you begin the session in your home directory:

pwd 
/home/e00000
echo $HOME
/home/e00000

The cd (change directory) command changes the current directory to the directory specified:

pwd 
/home/e00000
cd /usr/bin
pwd
/usr/bin

Paths to directories in the examples above always started with a / (slash), which denotes the "root", the beginning of a Unix file system. Such paths are therefore called absolute paths.

Now to come back to your home, you can of course enter cd /home/e00000. But that kind of typing is prohibitive. Unix allows many short cuts. For example just cd without any arguments takes you to your home.

Another shortcut is the hyphen: cd - takes you to the previous directory you were in. Try them out!

4.2.2 Relative paths

The cd command also allows relative paths. For example, let’s assume that you are in /usr. Then change to /usr/bin it is enough to type cd bin because the directory bin is directly where you “stand”.

cd /usr 
pwd
/usr
cd bin
pwd
/usr/bin

In every directory, except in the root, there is a special directory with the symbol ’..’ (two dots) which points to the directory above it, the so called parent directory. For example to go back to /usr from /usr/bin, you could do cd .. (dot dot).

pwd 
/usr/bin
cd ..
pwd
/usr

Another shortcut to remember is the ’~’ (tilde), which denotes your home directory. More interestingly, ~loginname stands for the home directory of the user loginname. Try changing to the home directories of others!

Note for the experts: Don’t close your home directories for others yet. That’ll take the fun out of the game!

4.2.3 Creating, moving and deleting directories

The mkdir (make directory) command is used to create directories:

mkdir progs 
ls
knave  progs  queen
ls −F
knave  progs/  queen

The previous example shows that the command ls alone would not mark directories differently. The option -F prints a slash behind directory names. (OK, in modern terminals colour is activated. Remember, the shell should also work on monochrome monitors!)

You can move directories, just like files:

mkdir progs 
ls −F
knave  progs/  queen $ mv progs /tmp 
ls −F 
knave  queen
ls −F /tmp
progs/ 

The rmdir (remove directory) command removes the directory specified as an argument to it:

ls −F 
knave  progs/  queen
rmdir progs 
OR 
rmdir progs 
rmdir: failed to remove ‘progs’: Directory not empty
rmdir −rf progs
ls
knave  queen

3.5 Typical Linux file structure

On a standard Linux system you will find the layout generally follows the scheme presented below.

DirectoryContent
/bin Common programs, shared by the system, the system administrator and the users.
/boot The startup files and the kernel, vmlinuz. In some recent distributions also grub data. Grub is the GRand Unified Boot loader and is an attempt to get rid of the many different boot-loaders we know today.
/dev Contains references to all the CPU peripheral hardware, which are represented as files with special properties.
/etc Most important system configuration files are in /etc, this directory contains data similar to those in the Control Panel in Windows
/home Home directories of the common users.
/initrd (on some distributions) Information for booting. Do not remove!
/lib Library files, includes files for all kinds of programs needed by the system and the users.
/lost+found Every partition has a lost+found in its upper directory. Files that were saved during failures are here.
/misc For miscellaneous purposes.
/mnt Standard mount point for external file systems, e.g. a CD-ROM or a digital camera.
/net Standard mount point for entire remote file systems
/opt Typically contains extra and third party software.
/proc A virtual file system containing information about system resources. More information about the meaning of the files in proc is obtained by entering the command man proc in a terminal window. The file proc.txt discusses the virtual file system in detail.
/root The administrative user's home directory. Mind the difference between /, the root directory and /root, the home directory of the root user.
/sbin Programs for use by the system and the system administrator.
/tmp Temporary space for use by the system, cleaned upon reboot, so don't use this for saving any work!
/usr Programs, libraries, documentation etc. for all user-related programs.
/var Storage for all variable files and temporary files created by users, such as log files, the mail queue, the print spooler area, space for temporary storage of files downloaded from the Internet, or to keep an image of a CD before burning it.


Please feel free to move around the directory tree of the practice computer, running the tree command for better orientation.

3.b Getting help from the system itself

In this sub-chapter you’ll be introduced to:

  • the manual pages and the GNU Info, the two primary documentation systems which should be built into any Unix system
  • the --help option provided by almost all the command line tools and the help command of the shell
  • command completion and command history - two features which greatly simplify your typing.

3.b.1 The manual pages

The original UNIX Programmer’s Manual documents the system divided into nine sections: Section 1 deals with the commands for the end user, the commands we discuss in this course. Incidentally games too were a part of the system, section 6 contains information on games. The remaining sections handle details which are aimed for the programmer and the system administrator. This documentation was always a part of the system.

The man (manual page) program searches, formats, and displays the information contained in those manual pages. Because many topics have a lot of information, output is piped through a terminal pager program for convenient viewing one page at a time; at the same time the information is formatted for a good visual display. You can fetch the manual page of a program by entering the name of the program after man. For example:

man uname 
UNAME(1)                         User Commands                        UNAME(1) 
NAME 
  uname − print system information 
SYNOPSIS 
  uname [OPTION]... 
DESCRIPTION 
  Print certain system information.  With no OPTION, same as −s. 
  −a, −−all 
    print all information, in the following order, except omit −p 
    and −i if unknown: 

  −s, −−kernel−name
    print the kernel name 
[...] 

The output says that the page is from section 1 ’User Commands’ and explains its usage.

Note that the man pages have a well defined structure, part of which is listed in next table. Thanks to their structure the man pages can be automatically converted to HTML, to book form or to graphical help programs. You will encounter them if you continue working in the shell after the course, but during this course we expect you to refer the local man pages with the help of man.

Item Description
SYNOPSIS Command usage along with the optional and non-optional arguments
DESCRIPTION Details on how to use the command and and explanation of each section
EXAMPLE Examples of how to use the command
FILES Files that have to be available for this command to work
SEE ALSO Commands that are similar in purpose
DIAGNOSTICS Explanation of error messages
WARNINGS Things to be careful about when using the command
BUGS Known problems and suggested improvements


For reading long manual pages on screen man delivers them through a pager, which depends on the Unix flavour. The pager used in Linux is called less. You navigate in less with single key strokes: f, Ctrl+F or Space bar to scroll page-wise forward or b, Ctrl+B, or Shift+Space to scroll back and q for quit. The key h (help) gives your the the full list of commands.

Another useful key is the / (forward slash) for search. Often all you need to know is what a specific option of a command does, for example what uname -s really means. In such cases just enter man uname and then press / - s. If the first hit is not the one you were looking for, keep on pressing n (next).

Exercise. Call the manual pages of the commands you already know and note down options you find interesting. Don’t forget, there is also a man man!

By default the man command prints only the dedicated page specifically about the topic. You can broaden this to view all man pages containing a particular string in their name by using the -f option. The following dialogue shows that there is only one date command whereas there are two time commands.

man −f date 
date (1)     − print or set the system date and time
man −f time
time (7)     − overview of time and timers
time (2)     − get time in seconds

You select the section of the manual by giving its number as an optin:

man 2 time 
TIME(2)               Linux Programmer’s Manual              TIME(2)  
NAME 
  time − get time in seconds 
[...] 
man 7 time 
TIME(7)               Linux Programmer’s Manual              TIME(7) 
NAME 
  time − overview of time and timers 
[...]

You get the same effect of man -f by the command whatis.

whatis time 
time (7)     − overview of time and timers
time (2)     − get time in seconds

A common problem with the man command is that you need to know the name of the command you want to know more about. The -k option broaden the search by including all commands whose description contains the search string:

man −f  user 
user: nothing appropriate.
man −k user
adduser.conf (5) − configuration file for adduser(8) and addgroup(8)
deluser.conf (5) − configuration file for deluser(8) and delgroup(8)
[...]
3.b.2 apropos

The apropos command is equivalent to man -k:

apropos user 
adduser.conf (5) − configuration file for adduser(8) and addgroup(8)
deluser.conf (5) − configuration file for deluser(8) and delgroup(8)
[...]

3.b.3 Exceptions

Some commands don't have separate documentation, because they are part of another command. cd, exit, logout and pwd are such exceptions. They are part of your shell program and are called shell built-in commands. For information about these, refer to the man or info pages of your shell, which is Bash In our case.

References: (optional)

- https://tldp.org/LDP/intro-linux/html/sect_02_03.html

-  Navigating the file system

Last modified: Thursday, 16 December 2021, 9:59 PM