Learning the Unix Programming Environment

De Pontão Nós Digitais


This is a series of practical tutorials on Unix - mainly GNU/Linux and OSX - in the context of modern programming practice. It is an up-to-date learning supplement to the great UPE book, "The Unix Programming Environment" by Kernighan and Ritchie. In addition to summarizing the outstanding classic and adapting it for current environments, LUPE also gathers a number of valuable practical tips and pointers from experienced programmers.

This Page is Under Development

Lots-of-tiny-code-reddit191578.png

First: read UPE

Even though LUPE is reasonably self-contained and covers what is most immediately useful to you, reading the original masterpiece - UPE - is highly recommended. Specially the exercises. Reading UPE is the top way of learning Unix at a level most useful to programmers. Raeding it will be the single thing that really makes you understand the tools, the idiosyncratic syntax, and why Unix's highly programmable interface works the way it does. Really understanding the practical tools and the power user side of the OS is what makes a highly efficent programmer and user. With the LUPE supplements you'll leverage that powerful knowledge into the broader context of today's practice of programming and advanced usage.


Background

Basics

Open a terminal (by launching the Terminal app). Play some commands and constructs of the Shell (sh) environment language:

List files in the current directory/foder:

ls

A common shortcut for ls is d, though it is not always available. Try it anyways:

d

In the aliases section we'll set this useful shortcut up in case you don't have it by default.

Also try these

 ls -a                                 # "all" - shows entries starting with '.', usually hidden
 ls -l
 ls -la

For which a useful short is

ll

Don't worry if you don't have ll - we'll set it up later.

Tell the computer to print "Hello, World":

echo hello, world

Then tell it to print this 10 times using a for loop in the bash language (just type it for now - we'll help you understand the syntax idiosyncrasies later):

 for i in `seq 1 10`; do echo hello, world; done

Each Unix command is an executable program, which is usually written in C or can be internally modeled with C's abstraction. As in C, each command has a return value and default/standard input and output handles (stdout/stdin).

The standard input and output can be redirected. Create a text file containing hello, world using the redirect operator

echo hello, world > a.txt

Print out the contents of the file

cat a.txt

You can use the redirect operator on any command construct that has an output, even on the for construct above:

 for i in `seq 1 10`; do echo hello, world; done > a.txt

Count the number of words in the file

wc -l a.txt

The UPE book masterfully explains why you should prefer that over (try it)

 cat a.txt | wc -l

You'll understand what the pipe `|' means later, but it basically connects the default output of a program to the standard input of another.

Command History

history

Gives a numbered list of your previous commands. You can ask for previous command number 17 using

!17

Instead of pressing UP multiple times. Another way is to search history by keyword as you type. Lets say you're trying to reverse search the above for statement. You can do this:

  1. type Control-R
  2. type any substring of the desired command, say for
  3. a command matching that substring is displayed
  4. press Enter to execute it, or keep pressing Control-R to search for other options

You can also refer to the previous command's 3rd argument on your current command:

 wc -l a.txt
 cat !!:2

Your history is located in your home directory, which can be referred to by '~'

cat ~/.history

On many setups this will contain the history up to a previous session. The current history is usually not stored in ~/.history until the session is over.

Further basics

Pressing Control-C usually aborts any command that is running interactively through the shell. Other commands that are not blocking the shell can be ended using kill and killall:

killall firefox

Or kill, which requires a process ID associated to a running app:

 ps -A|grep firefox
    553
 
 kill 553

If it doesnt go away, use -9 or -KILL

 killall -9 firefox
 kill -KILL 553

A reference to any single command is provided by man:

man bc

The standard sections of the manual include:

 1      User Commands
 2      System Calls
 3      C Library Functions
 4      Devices and Special Files
 5      File Formats and Conventions
 6      Games et. Al.
 7      Miscellanea
 8      System Administration tools and Deamons

You can ask man to look up an entry on a determined manual. If you want the C function, you can skip shell commands by using

man 3 printf

To search where a given keyword, eg exec, appears in all the manuals

apropos exec


Scripting

The shell is a full-fledged programming language which derives its power from being able to very easily control and combine commands to work together inside the OS environment. You can combine the above commands into a shell script using any text editor. Or you can just type

 echo 'ls | wc -l
       echo thats the number of files in the current folder.' > script.sh

Check whats inside the script.sh file

 cat script.sh

Run the command

 sh script.sh

Usually you won't edit using echo, but using more convenient editors. There are even variables you can use. Lets say we put the following into script.sh

 #!/bin/sh
 num_files=`ls | wc -l`
 echo $num_files is the number of files in the current folder.

You can also make it executable and even remove the extension so it becomes more like a command itself.

 mv script.sh script
 chmod a+x script
 ./script

More useful and fun stuff to try and understand later

Warning: The following commands may require the appropriate software packages to be installed to your system.

# Convert images
 convert image.png image.jpg
 
# Convert all images to a PDF
 convert *.jpg document.pdf
 
# Play videos
 mplayer movie.mkv
 
# Extract pages from a pdf
 pdftk old.pdf cat 1-9 26-end output new.pdf
 
# Extract all frames of a video
 mplayer video.mp4 -vo png -frames 5
 
# Make a GIF animation from images
 convert  -adjoin  -loop 0  -delay 5  *.gif  animation.gif

More useful commands are available at Utils-Macambira.


Setting up the modern unix programming environment

Useful script collections

Our utilitiess

Our personal scripts in current use and actual config files used for editing, programming, and daily use is available at Utils-Macambira.


Funcoeszz

A nifty collection of little shell utilities, having a portuguese slant to them. They can be found at funcoeszz.net. I personally use:

  • zzarrumanome - convert file names to lowercase and spaces to underscore, useful for shell scripting
zzarrumanome *
     RAMONES - I Don't Care.mp3 -> ramones-i_dont_care.mp3
     Toy Dolls - Wakey, Wakey!.mp3 -> toy_dolls-wakey_wakey.mp3
  • zztradutor
 zztradutor pt-de livro                # Buch


Configuring Bash for programming

The following are most useful BASH configuration files

 ~/.bashrc
 ~/.bash_profile
 ~/.bash_aliases

The .bash_profile shell script is meant to be executed once per work session - upon your login to the system or upon starting a login shell as in bash -ls. The .bashrc script is executed during each new shell that is started, as when typing bash at the prompt. The .bash_aliases shell script is executed by a suitably configured .bashrc and by convention contains only alias/shortcut definitions. For instance,

alias fox="firefox"              # you can now just type fox to start firefox

What should you put in ~/.bashrc and ~/.bash_profile? You should take a look at the existing ones in your system, and copy from more experienced friends. The ones I actually use on a daily basis are available at Utils.

~/bin

Create your own ~/bin folder to install the personal scripts that you'll be using often:

mkdir ~/bin

Move the desired script, say myscript, there:

mv myscript ~/bin

Now change permissions to execute

chmod a+x ~/bin/myscript

Set the path in your ~/.bash_profile

 export PATH=$HOME/bin:$PATH

Reload the environment for changes to take effect

 bash -ls          # start as login shell, reloading .profile
                     # XOR
 . ~/.bash_profile

You can now run your script without the ./:

myscript

You should use the cx shortcut given in UPE for making things more simple.

cx ~/bin/myscript

You can get our actual working versions of cx and other utilities suggested by UPE at Utils (do peek at the source code!).

Understanding the Bourne shell

Think about the internal C implementation model for commands. The commands's main() function sees its arguments through argv - the entire argument line broken into a number of strings. It is the shell that breaks the entire command string as in

 "echo hello, world"      --- sh --->      "echo" "hello," "world"

into distinct components. Understanding this process is key to understanding how the shell works, specially how to use the different quotes `'" and the escape character '\' .

The program echo can only see two distict "hello" and "world" strings in its argv for the above example. For it to see "hello world" as a unit, quotes can be employed:

echo "hello, world"

The behavior looks identical to that of the original unquoted command, but you can be sure that the actual echo program is now seeing hello world as a single entity. For instance, to print a file that has spaces in the filename, you can do

 cat "unix programming.txt"
                                  #OR
 cat 'unix programming.txt'

You can also escape the space, since the shell breaks the argument at blanks.

cat unix\ programming.txt

What's the difference?

  • ': the strongest quote, preventing the shell to parse, expand, modify nor split whats inside
  • ": allows the shell to expand expressions inside before generating the single, final string
  • \: not as convenient and is not guaranteed to generate a single final string, e.g. if shell expansions expand to space

These shell expansions are most used with variables. Lets test this behavior:

 a="power nix"                        # single quotes ' could be used
 echo "hello, $a world"
 echo 'hello, $a world'
 echo hello,\ $a\ world               # test: what is argc here?

This illustrates $a which expands the variable a - you can think of $ as an S for "substitute in here the contents of what follows".

And what is that weird backward ` quote for? It executes the string that is inside as a shell command and stores the ouput as a single string. You can think of it as "the output of", and it is extremely useful. Try it out:

  seq 1 10
  numbers=`seq 1 10`
  echo $numbers         # numbers == "1 2 3 4 5 6 7 8 9 10"
  numbers="seq 1 10"    # numbers == "seq 1 10"   literally
  echo $numbers
  numbers=$(seq 1 10)   # you can use $( ) instead of ` `
  echo $numbers

Let us take a look at one of the commands in the intro section to print hello world, say, 3 times, written more nicely:

 for i in `seq 1 3`; do 
    echo hello, world
 done

The for loop simply loops the variable i over each token obtained by the shell when parsing the string after in. For instance,

 for i in a b c; do              # or for i in 1 2 3   etc
    echo hello, world
 done

Would print hello, world 3 times, We're plugging in an independent command, seq, in order to generate the for loop string. By itself,

seq 1 3

simply prints 1 to 3. We can use seq's output as a string by enclosing it in back quotes:

 mystring=`seq 1 4`
                                 #OR
 mystring=$(seq 1 4) 
 echo $mystring

Since we know that the shell will break this string at blanks, in case its content were used as part of a commandline. Thus, we just give it to the for loop.

Practicing Quotes

Supose you have a collection of Beethoven's music in a folder, and you want to play a random piano sonata album but no concerto:

 cd 'music/classical/Ludwig van Beethoven - Complete Works [Brilliant Classics 100 CD Box]'
 ls
  CD 001 - Symphonies Nos.1&3                                   CD 053 - Piano Sonatas Op.109, Op.110, Op.111
  CD 002 - Symphonies Nos.2&7                                   CD 054 - Piano Variations I
  ...
  CD 031 - Violin Sonatas II                                    CD 083 - Scottish Songs WoO 156 & 157, complete
  CD 032 - Violin Sonatas III                                   CD 084 - Scottish Songs Op. 108 & WoO 158-1
  CD 033 - String Trios I                                       CD 085 - Folksongs WoO 158 a b c
  CD 034 - String Trios II                                      CD 086 - Symphony No. 9 in D minor Op.125 (Furtwagler 1951 EMI)
  CD 035 - String Quartets Op.18 Nos.1&2                        CD 087 - Symphony No. 3, Leonore Overtures - Otto Klemperer
  CD 036 - String Quartets Op.18 Nos.3&4                        CD 088 - Symphonies Nos.5  & 7 - Herbert von KARAJAN
  CD 037 - String Quartets Op.18 Nos.5&6; Op.95 Serioso         CD 089 - Piano Concerto No.5 - Piano Sonatas Nos. 8&23 (Edwin Fischer, Furtwangler)
  CD 038 - String Quartets Op.59 Nos.1&2                        CD 090 - Violin Concerto Op.61, Romances for Violin & Orchestra Nos.1&2
  CD 039 - String Quartets Op.74 & Op.131                       CD 091 - Piano Sonatas Op.13, 27, 57 - Yves NAT
  ...
 
 # play a random sonata album
 
 mplayer "`ls | grep -i Sonatas | grep -i Piano | grep -v Concerto | shuf -n 1`"/*  &      # vlc is a good alternative to mplayer

Try breaking down the above command to see what its made of. Here, shuf (or gshuf in OSX) just randomly shuffles the lines in its input and ouputs 1 requested line. Note how we had to use both " and ` quotes to get this to work with filenames having spaces in them.


Processes and Jobs

Running a program interactively normally blocks further input

 command
        # input is blocked

Type Control-C to stop the process in case it is still running. If the process is waiting for input, type Control-D to signal end of input (end of file) and exit.

Run a command in the background ("in parallel")

 command &                       # the ampersand & is like "run command & whatever I type next"

Bring it back to foreground

fg

then background

 # type Control-Z
 bg

Run multiple commands in the background ("in parallel")

 command1 & 
 command2 & 
 command3 &

List the running jobs spawned by the current shell

jobs

Each job has a job id and each process has a process id. Make job number 2 foreground, then background again:

fg 2
bg

Stop command2 by name

killall command2

Look into all running processes and respective resources

top

List all running processes with pids

ps -A

List all running processes named command3

 ps -A | grep command3

List jobs by pid

 jobs -p         # outputs PID 2143

Force kill

kill 2143

Wait for all jobs to finish

wait

Run an app with low priority

nice firefox &

Keep running program on the computer even after you log out:

 nohup command &
                        #OR
 nohup ls -lR / &

When you log back in (say, via ssh), there will be a file called nohup.out with the output of the process.

You can also use GNU screen for persistent terminal sessions.

Finding stuff

Basics / Overview / Practicalities

There are ways to search for a file with a specific name, or a file having a specific content. We will be dealing with the latter mostly for text files. Lets take a look at some useful commands and try to understand them.

Case 1: search for a file myfilename with a specific name.

 find . -name myfilename
                           #OR
 locate myfilename

The locate version simply uses indexed seach which is faster than a find. To index your hard drive for locate, you usualy perform the following command:

  updatedb &

or something similar, depending on the specific OS. We'll be dealing mostly with GNU locate, which is more feature-rich (albeit a bit slower) than BSD locate. You can use pattern matching (explained in the next section) to search for substrings of the name, etc. While find normally uses file regex or wildcards, locate is easy to use with full regular expression matching (see next section).

 find . -iname '*widget*.java'                  # -iname means case insensitive
 
 # Alternative form using locate
 locate -ri 'widget.*\.java$'                   # you'll understand the weird .* and $ later
 
 # Search for everything
 locate -ri 'jddl.*/widget.*\.java$' 
 
 # Search only in a folder having your username as a substring
 locate -ri 'username.*/.*widget.*java$'


Case 2: search for a file containing a specific string UpDirection

 # Find all .cpp files containing the string UpDir inside them (and not a superstring like UpDirection)
 find . -name '*.cpp' | xargs grep '\<UpDirection\>'
 
 # Alternative form using locate
 locate -r '\.cpp$' | xargs grep '\<UpDirection\>'

I use an lsc script from Utils which includes all the above inside it

 lsc -r                 # lists all source files recursively (*.c *.java *.cxx *.cpp *.php..)
 lsc -r | grep scilab | xargs grep --color '\<k\>'       # lists all source files with
                                                         # scilab in the filename having the variable / token k

One of the best options for grep in the context of code searching is '--color' as used above. You can also set it by default by including the following in your ~/.bashrc

 export GREP_OPTIONS='--color=auto'                      # inside your .bashrc

More about grep in the following subsection.

Grep, sed, awk, regular expressions

Misc. utilities

  • Convert decimal to hex
bc           # start the calculator language interpreter
ibase=10
obase=16
40           # output: 28

See also [1].

Networking

SSH

Log onto another machine in your local network:

 ssh username@10.0.0.60

Execute arbitrary commands in it interactively

ls
uname -v

Log off

exit

Execute one command on the remote machine non-interactively

 ssh username@10.0.0.60 who                  # tells you who is logged in the remote machine

Copy a file onto the machine

 echo testing >a.txt                         # creates the file
 scp a.txt 10.0.0.60:                        # scp /local_path/file username@remote:/remote_path

Automated transfer session with sftp

 sftp usrname@hostname << END
 <password>
 put filename1
 put filename2
 get filename3
 bye
 END

Recursively copy an entire file tree to the remote host, incrementally

rsync -avz folder 10.0.0.60:

If this transfer is aborted, you can restart it and it will transfer only the remaining (the differences). The "z" parameter is to compress prior to transferring. You can omit it for fast networks. Rsync very useful for backups.

Open the remote computer's browser on your own local screen

 ssh -X username@10.0.0.60
 firefox                                     # or any other program that needs a GUI

PS: you might have to configure X support, ask a local technicial or Google.

Control the remote screen by opening the browser in its screen, which usually referred to with an ID of :0.

   export DISPLAY=:0                  # on the remote computer
   firefox

For fun: you can also control the remote terminal, in case there is any one open

   ls /dev/pts
 
   # you get a list of numbers. test in each of them:
 
   echo hello /dev/pts/1
   echo hello /dev/pts/2
   echo hello /dev/pts/3   # bingo! the user on this remote host is using terminal 3 and received our message

You will understand how this works later, but UPE is the best place to get a hold of whats going on.


Public key (passwordless) authentication

Generate a public key

 ssh-keygen -t rsa -C "your_email@example.com"

Add the key to your ssh-agent so you don't have to type the password all the time

 # start the ssh-agent in the background
 eval "$(ssh-agent -s)"
 ssh-add ~/.ssh/id_rsa

Test it by communicatibg between two machines: machine 1 (eg, the one where you generated the key, the 'local' machine) ssh-ing into machine 2 (a 'remote' machine).

 scp ~/.ssh/id_rsa.pub maquina2:.ssh/

See also the Github article on ssh keys [2].

Basics

Discover other hosts on your LAN

 ping -b 10.0.0.0             # broadcast Ping on your local network address

Links