Blog
A random collection of thoughts on a variety of topics
This is the final stop on our tour of the lesser known coreutils. If you haven’t already done so, you should read part 1 and part 2 first. Once again let’s dive in with some cool commands!
mktemp
mktemp
will create a temporary file or directory with a unique name. You can also pass in a template and it will safely create a file with a unique name matching it. The template must contain at least 3 ‘X’ characters, but can contain more. The more X characters you use, the more random characters will appear in the name. If you just run the command with no arguments it will use /tmp
as the default location and the template tmp.XXXXXXXXXX
. If you specify only a template, the file will be created in the current working directory.
$ mktemp /tmp/tmp.FDxW98zfqW $ mktemp sometemp.XXXXXXXXXX sometemp.nlk4ihtGic $ mktemp --tmpdir=/tmp mytemp.XXXXXXXX /tmp/mytemp.ys6mtKKT
As you can see, the command will output the name of the of the temporary file it has just created. You can easily capture this in a shell script using a variable and a command substitution.
$ TEMPDIR="$(mktemp)" $ echo $TEMPDIR /tmp/tmp.eMspLL5dWc $ echo "Hello" > $TEMPDIR $ cat $TEMPDIR Hello
Combining mktemp
with unlink
can create a way to write data to a file without anyone being able to read it externally. The unlink
command will remove the file name from the file system, but if there are still open streams the data on disk remains intact until those streams are closed. The key is to use the exec
command to open file descriptors (streams) to the file before unlinking it.
$ TEMPDIR="$(mktemp --tmpdir=/tmp mytemp.XXXXXXXX)" $ exec 5>$TEMPDIR $ exec 6<$TEMPDIR $ echo "Hello to file descriptor" >&5 $ cat $TEMPDIR Hello to file descriptor $ unlink $TEMPDIR $ cat $TEMPDIR cat: /tmp/mytemp.M1kjasnR: No such file or directory $ echo "Hello to file descriptor after unlink" >&5 $ cat <&6 Hello to file descriptor after unlink
After you unlink the file, the two descriptors becomes like a named pipe. Anything you write to fd 5 will be stored until it is read from fd 6. If you try to cat
from fd 6 twice in a row, the second read will not return anything, as it was consumed by the first read. This is the best way to store “secret” data in a script without worrying about nosy users peeking at the file.
touch
touch
is a command that I am sure most Linux and Unix users are aware of. When used with a file name, it will create an empty file if the file does not exist. If the file does exist, it will update the last accessed and last modified timestamps.
$ ls -la -rw-rw-r-- 1 james james 897 Dec 17 14:49 dart_blog_part_4.html $ touch dart_blog_part_4.html $ ls -la -rw-rw-r-- 1 james james 897 Jun 14 07:32 dart_blog_part_4.html
The reason I bring this up is because of the lesser known options that let you mess with the timestamps of the file. You can change a file to have any timestamp you want, in the future or the past.
# use -t option to set a specific date $ touch -t 400006091148.12 dart_blog_part_4.html -rw-rw-r-- 1 james james 897 Jun 9 4000 dart_blog_part_4.html # use -d or --date to set a specific date with a date string $ touch -d "next tuesday" dart_blog_part_4.html -rw-rw-r-- 1 james james 897 Jun 18 2013 dart_blog_part_4.html # use the timestamp of a different file $ ls -la -rw-rw-r-- 1 james james 897 Jun 18 2013 dart_blog_part_4.html -rw-rw-r-- 1 james james 661 Jan 10 14:00 linux_journal_blog.html $ touch -r linux_journal_blog.html dart_blog_part_4.html -rw-rw-r-- 1 james james 897 Jan 10 14:00 dart_blog_part_4.html -rw-rw-r-- 1 james james 661 Jan 10 14:00 linux_journal_blog.html
ptx
ptx
is another one of those interesting tools that I would catalog with tsort
(See part 2). It is a very specialized command. It will produce a permuted index, including context, of the words in an input file. Of course it is better shown then described.
$ cat test.txt Roses are red Violets are blue Chocolate is sweet and so are you $ ptx test.txt Roses are red Violets are blue Chocolate is sweet and so are you Chocolate is sweet and so/ Roses are red Violets are blue sweet and so/ Roses are red Violets are blue Chocolate is are blue Chocolate is sweet and so are you /are red Violets is sweet and so are/ Roses are red Violets are blue Chocolate are/ Roses are red Violets are blue Chocolate is sweet and so blue Chocolate is sweet and so are you /Violets are Roses are red Violets are blue Chocolate is sweet and so are/ red Violets are blue Chocolate is sweet and so are you /are sweet and so are/ Roses are red Violets are blue Chocolate is are blue Chocolate is sweet and so are you /red Violets Violets are blue Chocolate is sweet and so are you /are red Chocolate is sweet and so are you /are blue
Now it should be a bit more clear. The words to the right of the break in the center are sorted in alphabetical order. Each word is printed the same number of times it appears in the file, and each word is shown with its surrounding context. There are some options to change how the output is displayed, or to isolate specific words.
# ignore case when sorting, and break on end of line $ ptx --ignore-case -S "\n" test.txt and so are you Roses are red Violets are blue and so are you Violets are blue Chocolate is sweet Chocolate is sweet Roses are red Roses are red and so are you Chocolate is sweet Violets are blue and so are you # same as above, and print only those lines with "are" in them $ ptx --ignore-case -S "\n" -W "are" test.txt Roses are red Violets are blue and so are you
pr
pr
can be used to paginate or columnate a text file to prepare it for printing. Basically it will read a plain text file in and add a header, footer, and page breaks. It can also be used to format the plain text file like a magazine article (2 column style). Honestly it does do a nice job, however the 2 column style is a bit quirky.
# Show the first 10 lines of the plain text file $ head -10 dart.txt Dart - A new web programming experience James Slocum [Section: Introduction] JavaScript has had a long standing monopoly on client side web programming. It has a tremendously large user base and countless libraries have been written in it. Surely it is the perfect language with no flaws at all! Unfortunately, this is simply not the case. JavaScript is not without its problems, and there exists a large number of libraries and "trans-pilers" that attempt to work around JavaScripts more quirky behaviors. JQuery, # Format it with pr, and show the first 10 lines $ pr -h "Dart - A new web programming experience" dart.txt | head -10 2013-02-28 10:19 Dart - A new web programming experience Page 1 Dart - A new web programming experience James Slocum [Section: Introduction] JavaScript has had a long standing monopoly on client side web programming.
As you can see in this example, pr
has added the requested header info, as well as page numbers and a date. There are a ton of flags to control different header, footer, and page formatting options. One of the strange things about the columnate -COLUMN --columns=NUM
behavior is that it truncates the lines instead of reformatting them.
# You can see that it just chops off the end of the line # instead of wrapping it $ pr -2 -h "Dart - A new web programming experience" dart.txt | head -10 2013-02-28 10:19 Dart - A new web programming experience Page 1 Dart - A new web programming experi defined with the class keyword. Eve James Slocum and all classes descend from the Ob single inheritance. The extends key [Section: Introduction] other than Object. Abstract classes JavaScript has had a long standing some default implementation. They c
It’s a good thing that we already have a tool to help us with this! We can combine the fmt
utility and pr
to generate cleaner 2 column text. We simply need to restrict the original text to 35 characters per line.
$ fmt -w 35 dart.txt | pr -2 -h "Dart - A new ... " | head -10 2013-06-12 10:14 Dart - A new web programming experience Page 1 Dart - A new web programming going to need to grab a copy experience James Slocum from dartlang.org. I personally chose to install only the SDK; [Section: Introduction] JavaScript however, there is an option to has had a long standing monopoly grab the full Dart integrated
Now we can clearly see that the columns are wrapping correctly. Now I know most of you might be thinking “Who writes with just plain text? Why is this still a thing?”. Well actually I still write a lot in plain text! Vim is my main editor for everything, not just code. So it makes sense to do all of my writing in plain text. If I need to format the text for publications, it is easy to use tools like LaTeX (pronounced lay-tek) or simply copy and paste it into OpenOffice. In fact plain text is the only format pretty much guaranteed to exist and be supported for as long as there are computers.
nl
nl
prints out the contents of a file with the lines numbered. cat --number
does the same thing, but nl
has a ton more options. With nl
you can control the style and format of the output. Want to only number lines that match a basic regex? No problem with the -b p<regex>
flag.
# number all lines (including blanks) $ nl -b a test.txt 1 Roses are red 2 Violets are blue 3 Chocolate is sweet 4 and so are you 5 6 Roses are the prettiest flower 7 but I like potatoes more 8 they will grow eyes 9 even in a dark drawer # number only those lines with the word Roses $ nl -b pRoses test.txt 1 Roses are red Violets are blue Chocolate is sweet and so are you 2 Roses are the prettiest flower but I like potatoes more they will grow eyes even in a dark drawer
You will have to excuse the bad poetry, as I just needed some simple test content. As you can see the nl
utility has a lot of great options to help number the lines in a file.
split, csplit
split
can be used to either split a text file into several files based on lines, or a binary file into pieces based on size. This is a great tool for working around the size limitations on email attachments. The two most useful flags for working with binary files are the -b --bytes
flags which allows you to specify the size of each partial file, and the -n --number
flags which allow you to specify how many partial files you want. You only use one or the other.
$ du -h VIDEO.ts 107M VIDEO.ts $ split -b 10M VIDEO.ts VIDEO_part_ $ du -h * 10M VIDEO_part_aa 10M VIDEO_part_ab 10M VIDEO_part_ac 10M VIDEO_part_ad 10M VIDEO_part_ae 10M VIDEO_part_af 10M VIDEO_part_ag 10M VIDEO_part_ah 10M VIDEO_part_ai 10M VIDEO_part_aj 6.3M VIDEO_part_ak
To reassemble the files, simply cat
them back together and redirect the output to the file name you want.
csplit
works a bit differently, csplit
is used to split a file based on a “context line.” The context line can be specified as regular expression pattern.
# This will copy up to but not including the lines with 'Roses' $ csplit test.txt /Roses/ {*} 0 66 99 $ ls xx00 xx01 xx02 test.txt $ cat xx00 $ cat xx01 Roses are red Violets are blue Chocolate is sweet and so are you $ cat xx02 Roses are the prettiest flower but I like potatoes more they will grow eyes even in a dark drawer
The 'xx’ prefix is configurable with the -f --prefix=PREFIX
flags. So if you want to give each part a more usable name, it’s not a problem. The suffix (the numbers in this case) is also settable with the -b --suffix-format=FORMAT
flags. Any printf()
format can be passed in. See this page for more details on that.
seq
seq
will generate a sequence of numbers and print them to standard out. There are three ways to execute this command. The first is just with the final number N. This will produce a sequence from 1-N. The next way is to provide a start S and final N. This will produce a sequence S-N. The last way is to provide a start S, final N, and an increment I. This will produce a sequence S-N, incrementing by I each time.
$ seq 5 1 2 3 4 5 $ seq 0 5 0 1 2 3 4 5 $ seq 0 2 5 0 2 4
seq
is rarely used directly and is instead used as part of a command chain or a shell script loop.
# use seq as a counter in a loop to find prime numbers for i in $(seq 0 15); do if [ $(factor $i | cut -d : -f 2 | wc -w) -eq 1 ]; then echo "$i is prime" fi done # Use seq and shuf to generate a pick 6 quick pick $ seq 1 49 | shuf | head -6 | sort -n | xargs printf "%d,%d,%d,%d,%d,%d\n" 2,18,28,33,43,48
timeout
timeout
is an awesome command when you need to put an upper limit on the amount of time a program should/can run. You can tell if a command timed out by the return value. A return of 124 indicates that the command timed out. 125 is returned if timeout
fails, 126 is if the command cannot be run, and 127 if the command cannot be found. If the command runs successfully it will return the normal command exit status. You can check the return status with the command echo $?
.
# cat will wait forever for input when invoked without a file # use timeout force cat to only run for 5 seconds $ timeout 5 cat # This is useful in scripts if you want data from the Internet # but don't want to hang forever $ timeout 5 curl http://jamesslocum.com
timeout
also has the -s --signal
flags to allow you to control what signal gets sent to the process. The default signal is 15 or TERM. You can change it to 9 or KILL to guarantee a process dies and does not catch or ignore the signal.
Links and further reading
This concludes my tour of the lesser known coreutils. Of course there are dozens of more “well known” coreutils that you can read up on. I highly recommend reading the man pages and info pages as well.
Wikipedia coreutils page
GNU coreutils Documentation
Bash Cookbook (Amazon link)
comments powered by Disqus