Learning the Unix Programming Environment

De Pontão Nós Digitais
Revisão de 15h36min de 19 de junho de 2020 por V1z (discussão | contribs) (→‎ACK and Ag: ctags)
Ir para navegaçãoIr para pesquisar


This is a series of practical tutorials on Unix - mainly GNU/Linux and OSX - in the context of modern programming practice. It is an up-to-date learning supplement to the great UPE book, "The Unix Programming Environment" by Kernighan and Ritchie. In addition to summarizing the outstanding classic and adapting it for current environments, LUPE also gathers a number of valuable practical tips and pointers from experienced programmers.

This Page is Under Development

Lots-of-tiny-code-reddit191578.png

First: read UPE

Even though LUPE is reasonably self-contained and covers what is most immediately useful to you, reading the original masterpiece - UPE - is highly recommended. Specially the exercises. Reading UPE is the top way of learning Unix at a level most useful to programmers. Reading it will be the single thing that really makes you understand the tools, the idiosyncratic syntax, and why Unix's highly programmable interface works the way it does. Really understanding the practical tools and the power user side of the OS is what makes a highly efficent programmer and user. With the LUPE supplements you'll leverage that powerful knowledge into the broader context of today's practice of programming and advanced usage.

Background

Basics

Open a terminal (by launching the Terminal app). Play some commands and constructs of the Shell (sh) environment language:

List files in the current directory/foder:

ls

A common shortcut for ls is d, though it is not always available. Try it anyways:

d

In the aliases section we'll set this useful shortcut up in case you don't have it by default.

Also try these <syntaxhighlight lang="bash">

ls -a                                 # "all" - shows entries starting with '.', usually hidden
ls -l
ls -la

</syntaxhighlight>

For which a useful short is

ll

Don't worry if you don't have ll - we'll set it up later.

Tell the computer to print "Hello, World":

echo hello, world

Then tell it to print this 10 times using a for loop in the bash language (just type it for now - we'll help you understand the syntax idiosyncrasies later): <syntaxhighlight lang="bash">

for i in `seq 1 10`; do echo hello, world; done 

</syntaxhighlight>

Each Unix command is an executable program, which is usually written in C or can be internally modeled with C's abstraction. As in C, each command has a return value and default/standard input and output handles (stdout/stdin).

The standard input and output can be redirected. Create a text file containing hello, world using the redirect operator

<syntaxhighlight lang="bash"> echo hello, world > a.txt </syntaxhighlight>

Print out the contents of the file

cat a.txt

You can use the redirect operator on any command construct that has an output, even on the for construct above: <syntaxhighlight lang="bash">

for i in `seq 1 10`; do echo hello, world; done > a.txt

</syntaxhighlight>

Count the number of words in the file

wc -l a.txt

The UPE book masterfully explains why you should prefer that over (try it) <syntaxhighlight lang="bash">

cat a.txt | wc -l 

</syntaxhighlight>

You'll understand what the pipe `|' means later, but it basically connects the default output of a program to the standard input of another.

Command History

history

Gives a numbered list of your previous commands. You can ask for previous command number 17 using

!17

Instead of pressing UP multiple times. Another way is to search history by keyword as you type. Lets say you're trying to reverse search the above for statement. You can do this:

  1. type Control-R
  2. type any substring of the desired command, say for
  3. a command matching that substring is displayed
  4. press Enter to execute it, or keep pressing Control-R to search for other options

You can also refer to the previous command's 3rd argument on your current command: <syntaxhighlight lang="bash">

wc -l a.txt
cat !!:2

</syntaxhighlight>

Your history is located in your home directory, which can be referred to by '~'

cat ~/.history

On many setups this will contain the history up to a previous session. The current history is usually not stored in ~/.history until the session is over.

Further basics

Pressing Control-C usually aborts any command that is running interactively through the shell. Other commands that are not blocking the shell can be ended using kill and killall:

killall firefox

Or kill, which requires a process ID associated to a running app: <syntaxhighlight lang="bash">

ps -A|grep firefox
   553
kill 553

</syntaxhighlight>

If it doesnt go away, use -9 or -KILL <syntaxhighlight lang="bash">

killall -9 firefox
kill -KILL 553

</syntaxhighlight>

A reference to any single command is provided by man:

man bc

The standard sections of the manual include:

 1      User Commands
 2      System Calls
 3      C Library Functions
 4      Devices and Special Files
 5      File Formats and Conventions
 6      Games et. Al.
 7      Miscellanea
 8      System Administration tools and Deamons

You can ask man to look up an entry on a determined manual. If you want the C function, you can skip shell commands by using

man 3 printf

To search where a given keyword, eg exec, appears in all the manuals

apropos exec


Scripting

The shell is a full-fledged programming language which derives its power from being able to very easily control and combine commands to work together inside the OS environment. You can combine the above commands into a shell script using any text editor. Or you can just type

<syntaxhighlight lang="bash">

echo 'ls | wc -l
      echo thats the number of files in the current folder.' > script.sh

</syntaxhighlight>

Check whats inside the script.sh file

 cat script.sh

Run the command

 sh script.sh

Usually you won't edit using echo, but using more convenient editors. There are even variables you can use. Lets say we put the following into script.sh <syntaxhighlight lang="bash">

#!/bin/sh
num_files=`ls | wc -l`
echo $num_files is the number of files in the current folder.

</syntaxhighlight>

You can also make it executable and even remove the extension so it becomes more like a command itself. <syntaxhighlight lang="bash">

mv script.sh script
chmod a+x script
./script

</syntaxhighlight>

More useful and fun stuff to try and understand later

Warning: The following commands may require the appropriate software packages to be installed to your system. <syntaxhighlight lang="bash">

  1. Convert images
convert image.png image.jpg
  1. Convert all images to a PDF
convert *.jpg document.pdf
  1. Play videos
mplayer movie.mkv
  1. Extract pages from a pdf
pdftk old.pdf cat 1-9 26-end output new.pdf

  1. Extract all frames of a video
mplayer video.mp4 -vo png -frames 5

  1. Make a GIF animation from images
convert  -adjoin  -loop 0  -delay 5  *.gif  animation.gif

</syntaxhighlight> More useful commands are available at Utils-Macambira.


Setting up the modern unix programming environment

Useful script collections

Our utilitiess

Our personal scripts in current use and actual config files used for editing, programming, and daily use is available at Utils-Macambira.


Funcoeszz

A nifty collection of little shell utilities, having a portuguese slant to them. They can be found at funcoeszz.net. I personally use:

  • zzarrumanome - convert file names to lowercase and spaces to underscore, useful for shell scripting
zzarrumanome *
     RAMONES - I Don't Care.mp3 -> ramones-i_dont_care.mp3
     Toy Dolls - Wakey, Wakey!.mp3 -> toy_dolls-wakey_wakey.mp3
  • zztradutor

<syntaxhighlight lang="bash">

zztradutor pt-de livro                # Buch

</syntaxhighlight>


Configuring Bash for programming

The following are most useful BASH configuration files

 ~/.bashrc
 ~/.bash_profile
 ~/.bash_aliases

The .bash_profile shell script is meant to be executed once per work session - upon your login to the system or upon starting a login shell as in bash -ls. The .bashrc script is executed during each new shell that is started, as when typing bash at the prompt. The .bash_aliases shell script is executed by a suitably configured .bashrc and by convention contains only alias/shortcut definitions. For instance,

alias fox="firefox"              # you can now just type fox to start firefox

What should you put in ~/.bashrc and ~/.bash_profile? You should take a look at the existing ones in your system, and copy from more experienced friends. The ones I actually use on a daily basis are available at Utils.

~/bin

Create your own ~/bin folder to install the personal scripts that you'll be using often:

mkdir ~/bin

Move the desired script, say myscript, there:

mv myscript ~/bin

Now change permissions to execute

chmod a+x ~/bin/myscript

Set the path in your ~/.bash_profile <syntaxhighlight lang="bash">

export PATH=$HOME/bin:$PATH

</syntaxhighlight>

Reload the environment for changes to take effect <syntaxhighlight lang="bash">

bash -ls          # start as login shell, reloading .profile
                    # XOR
. ~/.bash_profile

</syntaxhighlight>

You can now run your script without the ./:

myscript

You should use the cx shortcut given in UPE for making things more simple.

cx ~/bin/myscript

You can get our actual working versions of cx and other utilities suggested by UPE at Utils (do peek at the source code!).

Understanding the Bourne shell

Think about the internal C implementation model for commands. The commands's main() function sees its arguments through argv - the entire argument line broken into a number of strings. It is the shell that breaks the entire command string as in

 "echo hello, world"      --- sh --->      "echo" "hello," "world"

into distinct components. Understanding this process is key to understanding how the shell works, specially how to use the different quotes `'" and the escape character '\' .

The program echo can only see two distict "hello" and "world" strings in its argv for the above example. For it to see "hello world" as a unit, quotes can be employed:

echo "hello, world"

The behavior looks identical to that of the original unquoted command, but you can be sure that the actual echo program is now seeing hello world as a single entity. For instance, to print a file that has spaces in the filename, you can do <syntaxhighlight lang="bash">

cat "unix programming.txt"
                                 #OR
cat 'unix programming.txt'

</syntaxhighlight>

You can also escape the space, since the shell breaks the argument at blanks.

cat unix\ programming.txt

What's the difference?

  • ': the strongest quote, preventing the shell to parse, expand, modify nor split whats inside
  • ": allows the shell to expand expressions inside before generating the single, final string
  • \: not as convenient and is not guaranteed to generate a single final string, e.g. if shell expansions expand to space

These shell expansions are most used with variables. Lets test this behavior: <syntaxhighlight lang="bash">

a="power nix"                        # single quotes ' could be used
echo "hello, $a world"
echo 'hello, $a world'
echo hello,\ $a\ world               # test: what is argc here?

</syntaxhighlight>

This illustrates $a which expands the variable a - you can think of $ as an S for "substitute in here the contents of what follows".

And what is that weird backward ` quote for? It executes the string that is inside as a shell command and stores the ouput as a single string. You can think of it as "the output of", and it is extremely useful. Try it out: <syntaxhighlight lang="bash">

 seq 1 10
 numbers=`seq 1 10`
 echo $numbers         # numbers == "1 2 3 4 5 6 7 8 9 10"
 numbers="seq 1 10"    # numbers == "seq 1 10"   literally
 echo $numbers
 numbers=$(seq 1 10)   # you can use $( ) instead of ` `
 echo $numbers

</syntaxhighlight>

Let us take a look at one of the commands in the intro section to print hello world, say, 3 times, written more nicely: <syntaxhighlight lang="bash">

for i in `seq 1 3`; do 
   echo hello, world
done 

</syntaxhighlight> The for loop simply loops the variable i over each token obtained by the shell when parsing the string after in. For instance, <syntaxhighlight lang="bash">

for i in a b c; do              # or for i in 1 2 3   etc
   echo hello, world
done 

</syntaxhighlight> Would print hello, world 3 times, We're plugging in an independent command, seq, in order to generate the for loop string. By itself,

seq 1 3

simply prints 1 to 3. We can use seq's output as a string by enclosing it in back quotes: <syntaxhighlight lang="bash">

mystring=`seq 1 4`
                                #OR
mystring=$(seq 1 4) 
echo $mystring

</syntaxhighlight> Since we know that the shell will break this string at blanks, in case its content were used as part of a commandline. Thus, we just give it to the for loop.

Practicing Quotes

Supose you have a collection of Beethoven's music in a folder, and you want to play a random piano sonata album but no concerto:

<syntaxhighlight lang="bash">

cd 'music/classical/Ludwig van Beethoven - Complete Works [Brilliant Classics 100 CD Box]'
ls
 CD 001 - Symphonies Nos.1&3                                   CD 053 - Piano Sonatas Op.109, Op.110, Op.111
 CD 002 - Symphonies Nos.2&7                                   CD 054 - Piano Variations I
 ...
 CD 031 - Violin Sonatas II                                    CD 083 - Scottish Songs WoO 156 & 157, complete
 CD 032 - Violin Sonatas III                                   CD 084 - Scottish Songs Op. 108 & WoO 158-1
 CD 033 - String Trios I                                       CD 085 - Folksongs WoO 158 a b c
 CD 034 - String Trios II                                      CD 086 - Symphony No. 9 in D minor Op.125 (Furtwagler 1951 EMI)
 CD 035 - String Quartets Op.18 Nos.1&2                        CD 087 - Symphony No. 3, Leonore Overtures - Otto Klemperer
 CD 036 - String Quartets Op.18 Nos.3&4                        CD 088 - Symphonies Nos.5  & 7 - Herbert von KARAJAN
 CD 037 - String Quartets Op.18 Nos.5&6; Op.95 Serioso         CD 089 - Piano Concerto No.5 - Piano Sonatas Nos. 8&23 (Edwin Fischer, Furtwangler)
 CD 038 - String Quartets Op.59 Nos.1&2                        CD 090 - Violin Concerto Op.61, Romances for Violin & Orchestra Nos.1&2
 CD 039 - String Quartets Op.74 & Op.131                       CD 091 - Piano Sonatas Op.13, 27, 57 - Yves NAT
 ...

# play a random sonata album
mplayer "`ls | grep -i Sonatas | grep -i Piano | grep -v Concerto | shuf -n 1`"/*  &      # vlc is a good alternative to mplayer

</syntaxhighlight>

Try breaking down the above command to see what its made of. Here, shuf (or gshuf in OSX) just randomly shuffles the lines in its input and ouputs 1 requested line. Note how we had to use both " and ` quotes to get this to work with filenames having spaces in them.


Processes and Jobs

Running a program interactively normally blocks further input <syntaxhighlight lang="bash">

command
       # input is blocked

</syntaxhighlight>

Type Control-C to stop the process in case it is still running. If the process is waiting for input, type Control-D to signal end of input (end of file) and exit.

Run a command in the background ("in parallel") <syntaxhighlight lang="bash">

command &                       # the ampersand & is like "run command & whatever I type next"

</syntaxhighlight>

Bring it back to foreground

fg

then background <syntaxhighlight lang="bash">

# type Control-Z
bg

</syntaxhighlight>

Run multiple commands in the background ("in parallel") <syntaxhighlight lang="bash">

command1 & 
command2 & 
command3 & 

</syntaxhighlight>

List the running jobs spawned by the current shell

jobs

Each job has a job id and each process has a process id. Make job number 2 foreground, then background again:

fg 2
bg

Stop command2 by name

killall command2

Look into all running processes and respective resources

top

List all running processes with pids

ps -A

List all running processes named command3 <syntaxhighlight lang="bash">

ps -A | grep command3

</syntaxhighlight>

List jobs by pid <syntaxhighlight lang="bash">

jobs -p         # outputs PID 2143

</syntaxhighlight>

Force kill

kill 2143

Wait for all jobs to finish

wait

Run an app with low priority

nice firefox &

Keep running program on the computer even after you log out: <syntaxhighlight lang="bash">

nohup command &
                       #OR
nohup ls -lR / &

</syntaxhighlight>

When you log back in (say, via ssh), there will be a file called nohup.out with the output of the process.

You can also use GNU screen for persistent terminal sessions.

Finding stuff

Basics / Overview / Practicalities

There are ways to search for a file with a specific name, or a file having a specific content. We will be dealing with the latter mostly for text files. Lets take a look at some useful commands and try to understand them.

Case 1: search for a file myfilename with a specific name. <syntaxhighlight lang="bash">

find . -name myfilename
                          #OR
locate myfilename

</syntaxhighlight> The locate version simply uses indexed seach which is faster than a find. To index your hard drive for locate, you usualy perform the following command: <syntaxhighlight lang="bash">

 updatedb &

</syntaxhighlight> or something similar, depending on the specific OS. We'll be dealing mostly with GNU locate, which is more feature-rich (albeit a bit slower) than BSD locate. You can use pattern matching (explained in the next section) to search for substrings of the name, etc. While find normally uses file regex or wildcards, locate is easy to use with full regular expression matching (see next section). <syntaxhighlight lang="bash">

find . -iname '*widget*.java'                  # -iname means case insensitive
# Alternative form using locate
locate -ri 'widget.*\.java$'                   # you'll understand the weird .* and $ later
# Search for everything
locate -ri 'jddl.*/widget.*\.java$' 
# Search only in a folder having your username as a substring
locate -ri 'username.*/.*widget.*java$'       

</syntaxhighlight>


Case 2: search for a file containing a specific string UpDirection <syntaxhighlight lang="bash">

# Find all .cpp files containing the string UpDir inside them (and not a superstring like UpDirection)
find . -name '*.cpp' | xargs grep '\<UpDirection\>'
# Alternative form using locate
locate -r '\.cpp$' | xargs grep '\<UpDirection\>'

</syntaxhighlight>

I use an lsc script from Utils which includes all the above inside it <syntaxhighlight lang="bash">

lsc -r                 # lists all source files recursively (*.c *.java *.cxx *.cpp *.php..)
lsc -r | grep scilab | xargs grep --color '\<k\>'       # lists all source files with
                                                        # scilab in the filename having the variable / token k

</syntaxhighlight>

One of the best options for grep in the context of code searching is '--color' as used above. You can also set it by default by including the following in your ~/.bashrc <syntaxhighlight lang="bash">

export GREP_OPTIONS='--color=auto'                      # inside your .bashrc

</syntaxhighlight> More about grep in the following subsection.

Grep, sed, awk, regular expressions

ACK and Ag para busca em Codigo

Higher level tools are available for large codebases and for use within Vim, which automate various things one can do with lower level tools such as `find`, `grep`, `sed`, etc.

`Ack` and its optimized variant `Ag` are both useful to have.

After installing them (see https://beyondgrep.com/install/ and https://github.com/ggreer/the_silver_searcher),

you can simply do <syntaxhighlight lang="bash">

ack pattern                        # or ag pattern

</syntaxhighlight> Where `pattern` is any regex you are used to. It will recursively search the current directory and format the output in a friendly way. This can be used inside VIM with the ACK plugin: https://github.com/mileszs/ack.vim You can then search like so <syntaxhighlight lang="vim">

Ack pattern

</syntaxhighlight>

See also

Misc. utilities

  • Convert decimal to hex
bc           # start the calculator language interpreter
ibase=10
obase=16
40           # output: 28

See also [1].

Networking

SSH

Log onto another machine in your local network: <syntaxhighlight lang="bash">

ssh username@10.0.0.60

</syntaxhighlight>

Execute arbitrary commands in it interactively

ls
uname -v

Log off

exit

Execute one command on the remote machine non-interactively <syntaxhighlight lang="bash">

ssh username@10.0.0.60 who                  # tells you who is logged in the remote machine

</syntaxhighlight>

Copy a file onto the machine <syntaxhighlight lang="bash">

echo testing >a.txt                         # creates the file
scp a.txt 10.0.0.60:                        # scp /local_path/file username@remote:/remote_path

</syntaxhighlight>

Automated transfer session with sftp <syntaxhighlight lang="bash">

sftp usrname@hostname << END
<password>
put filename1
put filename2
get filename3
bye
END

</syntaxhighlight>

Recursively copy an entire file tree to the remote host, incrementally

rsync -avz folder 10.0.0.60:

If this transfer is aborted, you can restart it and it will transfer only the remaining (the differences). The "z" parameter is to compress prior to transferring. You can omit it for fast networks. Rsync very useful for backups.

Open the remote computer's browser on your own local screen <syntaxhighlight lang="bash">

ssh -X username@10.0.0.60
firefox                                     # or any other program that needs a GUI

</syntaxhighlight>

PS: you might have to configure X support, ask a local technicial or Google.

Control the remote screen by opening the browser in its screen, which usually referred to with an ID of :0. <syntaxhighlight lang="bash">

  export DISPLAY=:0                  # on the remote computer
  firefox

</syntaxhighlight> For fun: you can also control the remote terminal, in case there is any one open <syntaxhighlight lang="bash">

  ls /dev/pts
  # you get a list of numbers. test in each of them:
  
  echo hello /dev/pts/1
  echo hello /dev/pts/2
  echo hello /dev/pts/3   # bingo! the user on this remote host is using terminal 3 and received our message

</syntaxhighlight> You will understand how this works later, but UPE is the best place to get a hold of whats going on.


Public key (passwordless) authentication

Generate a public key <syntaxhighlight lang="bash">

ssh-keygen -t rsa -C "your_email@example.com"

</syntaxhighlight>

Add the key to your ssh-agent so you don't have to type the password all the time <syntaxhighlight lang="bash">

# start the ssh-agent in the background
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa

</syntaxhighlight>

Test it by communicatibg between two machines: machine 1 (eg, the one where you generated the key, the 'local' machine) ssh-ing into machine 2 (a 'remote' machine).

<syntaxhighlight lang="bash">

scp ~/.ssh/id_rsa.pub maquina2:.ssh/

</syntaxhighlight>

See also the Github article on ssh keys [2].

Basics

Discover other hosts on your LAN

<syntaxhighlight lang="bash">

ping -b 10.0.0.0             # broadcast Ping on your local network address

</syntaxhighlight>

Links

Further Tips

  • Use A |& B instead of A | B to pass both stdout and stderr of program A into stdin of B