Blog

A random collection of thoughts on a variety of topics

“A tour of the lesser known coreutils - Part 3”

2013-06-14

This is the final stop on our tour of the lesser known coreutils. If you haven’t already done so, you should read part 1 and part 2 first. Once again let’s dive in with some cool commands!

mktemp

mktemp will create a temporary file or directory with a unique name. You can also pass in a template and it will safely create a file with a unique name matching it. The template must contain at least 3 ‘X’ characters, but can contain more. The more X characters you use, the more random characters will appear in the name. If you just run the command with no arguments it will use /tmp as the default location and the template tmp.XXXXXXXXXX. If you specify only a template, the file will be created in the current working directory.

$ mktemp
/tmp/tmp.FDxW98zfqW

$ mktemp sometemp.XXXXXXXXXX
sometemp.nlk4ihtGic

$ mktemp --tmpdir=/tmp mytemp.XXXXXXXX
/tmp/mytemp.ys6mtKKT

As you can see, the command will output the name of the of the temporary file it has just created. You can easily capture this in a shell script using a variable and a command substitution.

$ TEMPDIR="$(mktemp)"
$ echo $TEMPDIR
/tmp/tmp.eMspLL5dWc

$ echo "Hello" > $TEMPDIR
$ cat $TEMPDIR
Hello

Combining mktemp with unlink can create a way to write data to a file without anyone being able to read it externally. The unlink command will remove the file name from the file system, but if there are still open streams the data on disk remains intact until those streams are closed. The key is to use the exec command to open file descriptors (streams) to the file before unlinking it.

$ TEMPDIR="$(mktemp --tmpdir=/tmp mytemp.XXXXXXXX)"
$ exec 5>$TEMPDIR
$ exec 6<$TEMPDIR
$ echo "Hello to file descriptor" >&5
$ cat $TEMPDIR
Hello to file descriptor

$ unlink $TEMPDIR
$ cat $TEMPDIR
cat: /tmp/mytemp.M1kjasnR: No such file or directory

$ echo "Hello to file descriptor after unlink" >&5
$ cat <&6
Hello to file descriptor after unlink

After you unlink the file, the two descriptors becomes like a named pipe. Anything you write to fd 5 will be stored until it is read from fd 6. If you try to cat from fd 6 twice in a row, the second read will not return anything, as it was consumed by the first read. This is the best way to store “secret” data in a script without worrying about nosy users peeking at the file.

touch

touch is a command that I am sure most Linux and Unix users are aware of. When used with a file name, it will create an empty file if the file does not exist. If the file does exist, it will update the last accessed and last modified timestamps.

$ ls -la
-rw-rw-r--  1 james james   897 Dec 17 14:49 dart_blog_part_4.html

$ touch dart_blog_part_4.html
$ ls -la
-rw-rw-r-- 1 james james 897 Jun 14 07:32 dart_blog_part_4.html

The reason I bring this up is because of the lesser known options that let you mess with the timestamps of the file. You can change a file to have any timestamp you want, in the future or the past.

# use -t option to set a specific date
$ touch -t 400006091148.12 dart_blog_part_4.html
-rw-rw-r--  1 james james   897 Jun  9  4000 dart_blog_part_4.html

# use -d or --date to set a specific date with a date string
$ touch -d "next tuesday" dart_blog_part_4.html
-rw-rw-r--  1 james james   897 Jun 18  2013 dart_blog_part_4.html

# use the timestamp of a different file
$ ls -la
-rw-rw-r--  1 james james   897 Jun 18  2013 dart_blog_part_4.html
-rw-rw-r--  1 james james   661 Jan 10 14:00 linux_journal_blog.html

$ touch -r linux_journal_blog.html dart_blog_part_4.html
-rw-rw-r--  1 james james   897 Jan 10 14:00 dart_blog_part_4.html
-rw-rw-r--  1 james james   661 Jan 10 14:00 linux_journal_blog.html

ptx

ptx is another one of those interesting tools that I would catalog with tsort (See part 2). It is a very specialized command. It will produce a permuted index, including context, of the words in an input file. Of course it is better shown then described.

$ cat test.txt
Roses are red
Violets are blue
Chocolate is sweet
and so are you

$ ptx test.txt
   Roses are red Violets are blue   Chocolate is sweet and so are you
Chocolate is sweet and so/          Roses are red Violets are blue
sweet and so/       Roses are red   Violets are blue Chocolate is
      are blue Chocolate is sweet   and so are you      /are red Violets
is sweet and so are/        Roses   are red Violets are blue Chocolate
are/        Roses are red Violets   are blue Chocolate is sweet and so
   blue Chocolate is sweet and so   are you                 /Violets are
        Roses are red Violets are   blue Chocolate is sweet and so are/
   red Violets are blue Chocolate   is sweet and so are you         /are
sweet and so are/       Roses are   red Violets are blue Chocolate is
  are blue Chocolate is sweet and   so are you              /red Violets
    Violets are blue Chocolate is   sweet and so are you        /are red
    Chocolate is sweet and so are   you                        /are blue

Now it should be a bit more clear. The words to the right of the break in the center are sorted in alphabetical order. Each word is printed the same number of times it appears in the file, and each word is shown with its surrounding context. There are some options to change how the output is displayed, or to isolate specific words.

# ignore case when sorting, and break on end of line
$ ptx --ignore-case -S "\n" test.txt
                                 and so are you
                         Roses   are red
                       Violets   are blue
                        and so   are you
                   Violets are   blue
                                 Chocolate is sweet
                     Chocolate   is sweet
                     Roses are   red
                                 Roses are red
                           and   so are you
                  Chocolate is   sweet
                                 Violets are blue
                    and so are   you

# same as above, and print only those lines with "are" in them
$ ptx --ignore-case -S "\n" -W "are" test.txt
                         Roses   are red
                       Violets   are blue
                        and so   are you

pr

pr can be used to paginate or columnate a text file to prepare it for printing. Basically it will read a plain text file in and add a header, footer, and page breaks. It can also be used to format the plain text file like a magazine article (2 column style). Honestly it does do a nice job, however the 2 column style is a bit quirky.

# Show the first 10 lines of the plain text file
$ head -10 dart.txt
Dart - A new web programming experience
James Slocum

[Section: Introduction]
JavaScript has had a long standing monopoly on client side web programming.
It has a tremendously large user base and countless libraries have been 
written in it.  Surely it is the perfect language with no flaws at all! 
Unfortunately, this is simply not the case. JavaScript is not without its 
problems, and there exists a large number of libraries and "trans-pilers" 
that attempt to work around JavaScripts more quirky behaviors. JQuery, 

# Format it with pr, and show the first 10 lines
$ pr -h "Dart - A new web programming experience" dart.txt | head -10


2013-02-28 10:19     Dart - A new web programming experience      Page 1


Dart - A new web programming experience
James Slocum

[Section: Introduction]
JavaScript has had a long standing monopoly on client side web programming.

As you can see in this example, pr has added the requested header info, as well as page numbers and a date. There are a ton of flags to control different header, footer, and page formatting options. One of the strange things about the columnate -COLUMN --columns=NUM behavior is that it truncates the lines instead of reformatting them.

# You can see that it just chops off the end of the line
# instead of wrapping it
$ pr -2 -h "Dart - A new web programming experience" dart.txt | head -10


2013-02-28 10:19     Dart - A new web programming experience      Page 1


Dart - A new web programming experi defined with the class keyword. Eve
James Slocum                        and all classes descend from the Ob
                                    single inheritance. The extends key
[Section: Introduction]             other than Object. Abstract classes
JavaScript has had a long standing  some default implementation. They c

It’s a good thing that we already have a tool to help us with this! We can combine the fmt utility and pr to generate cleaner 2 column text. We simply need to restrict the original text to 35 characters per line.

$ fmt -w 35 dart.txt | pr -2 -h "Dart - A new ... " | head -10


2013-06-12 10:14     Dart - A new web programming experience      Page 1


Dart - A new web programming        going to need to grab a copy
experience James Slocum             from dartlang.org. I personally
                                    chose to install only the SDK;
[Section: Introduction] JavaScript  however, there is an option to
has had a long standing monopoly    grab the full Dart integrated

Now we can clearly see that the columns are wrapping correctly. Now I know most of you might be thinking “Who writes with just plain text? Why is this still a thing?”. Well actually I still write a lot in plain text! Vim is my main editor for everything, not just code. So it makes sense to do all of my writing in plain text. If I need to format the text for publications, it is easy to use tools like LaTeX (pronounced lay-tek) or simply copy and paste it into OpenOffice. In fact plain text is the only format pretty much guaranteed to exist and be supported for as long as there are computers.

nl

nl prints out the contents of a file with the lines numbered. cat --number does the same thing, but nl has a ton more options. With nl you can control the style and format of the output. Want to only number lines that match a basic regex? No problem with the -b p<regex> flag.

# number all lines (including blanks)
$ nl -b a test.txt
     1  Roses are red
     2  Violets are blue
     3  Chocolate is sweet
     4  and so are you
     5
     6  Roses are the prettiest flower
     7  but I like potatoes more
     8  they will grow eyes
     9  even in a dark drawer

# number only those lines with the word Roses
$ nl -b pRoses test.txt
     1  Roses are red
       Violets are blue
       Chocolate is sweet
       and so are you

     2  Roses are the prettiest flower
       but I like potatoes more
       they will grow eyes
       even in a dark drawer

You will have to excuse the bad poetry, as I just needed some simple test content. As you can see the nl utility has a lot of great options to help number the lines in a file.

split, csplit

split can be used to either split a text file into several files based on lines, or a binary file into pieces based on size. This is a great tool for working around the size limitations on email attachments. The two most useful flags for working with binary files are the -b --bytes flags which allows you to specify the size of each partial file, and the -n --number flags which allow you to specify how many partial files you want. You only use one or the other.

$ du -h VIDEO.ts
107M    VIDEO.ts

$ split -b 10M VIDEO.ts VIDEO_part_
$ du -h *
10M     VIDEO_part_aa
10M     VIDEO_part_ab
10M     VIDEO_part_ac
10M     VIDEO_part_ad
10M     VIDEO_part_ae
10M     VIDEO_part_af
10M     VIDEO_part_ag
10M     VIDEO_part_ah
10M     VIDEO_part_ai
10M     VIDEO_part_aj
6.3M    VIDEO_part_ak

To reassemble the files, simply cat them back together and redirect the output to the file name you want.

csplit works a bit differently, csplit is used to split a file based on a “context line.” The context line can be specified as regular expression pattern.

# This will copy up to but not including the lines with 'Roses'
$ csplit test.txt /Roses/ {*}
0
66
99

$ ls
xx00  
xx01 
xx02
test.txt

$ cat xx00

$ cat xx01
Roses are red
Violets are blue
Chocolate is sweet
and so are you

$ cat xx02
Roses are the prettiest flower
but I like potatoes more
they will grow eyes
even in a dark drawer

The 'xx’ prefix is configurable with the -f --prefix=PREFIX flags. So if you want to give each part a more usable name, it’s not a problem. The suffix (the numbers in this case) is also settable with the -b --suffix-format=FORMAT flags. Any printf() format can be passed in. See this page for more details on that.

seq

seq will generate a sequence of numbers and print them to standard out. There are three ways to execute this command. The first is just with the final number N. This will produce a sequence from 1-N. The next way is to provide a start S and final N. This will produce a sequence S-N. The last way is to provide a start S, final N, and an increment I. This will produce a sequence S-N, incrementing by I each time.

$ seq 5
1
2
3
4
5

$ seq 0 5
0
1
2
3
4
5

$ seq 0 2 5
0
2
4

seq is rarely used directly and is instead used as part of a command chain or a shell script loop.

# use seq as a counter in a loop to find prime numbers
for i in $(seq 0 15); do
   if [ $(factor $i | cut -d : -f 2 | wc -w) -eq 1 ]; then
      echo "$i is prime"
   fi
done

# Use seq and shuf to generate a pick 6 quick pick
$ seq 1 49 | shuf | head -6 | sort -n | xargs printf "%d,%d,%d,%d,%d,%d\n"
2,18,28,33,43,48

timeout

timeout is an awesome command when you need to put an upper limit on the amount of time a program should/can run. You can tell if a command timed out by the return value. A return of 124 indicates that the command timed out. 125 is returned if timeout fails, 126 is if the command cannot be run, and 127 if the command cannot be found. If the command runs successfully it will return the normal command exit status. You can check the return status with the command echo $?.

# cat will wait forever for input when invoked without a file
# use timeout force cat to only run for 5 seconds
$ timeout 5 cat

# This is useful in scripts if you want data from the Internet
# but don't want to hang forever
$ timeout 5 curl http://jamesslocum.com

timeout also has the -s --signal flags to allow you to control what signal gets sent to the process. The default signal is 15 or TERM. You can change it to 9 or KILL to guarantee a process dies and does not catch or ignore the signal.

Links and further reading

This concludes my tour of the lesser known coreutils. Of course there are dozens of more “well known” coreutils that you can read up on. I highly recommend reading the man pages and info pages as well.

Wikipedia coreutils page
GNU coreutils Documentation
Bash Cookbook (Amazon link)