blog


:(){ :|:& } ;:

The Inner Thoughts of a C Developer.

(25 Posts)


“Testing Memory I/O Bandwidth”

2013-10-16

Very frequently at my company, we find ourselves pushing our hardware to its limit. Usually we are able to dig in and find some optimizations that we may have missed before to squeeze some extra performance out of our products. This time started out a little different though. The issue we were seeing did not seem to be caused by CPU speed or memory capacity, but I/O bandwidth.

In an attempt to quadruple the output of one of our products, we hit a hard wall when running at the peak stress level. The developer on the project Stephen and I began brain storming on what the issue might be and why we were hitting a cap. All of the specs on paper seem to indicate we had more then enough machine to get the job done.

In order to start diagnosing our problem, we looked to a program called mbw which is a memory bandwidth benchmark tool. We installed it from the Ubuntu repository using sudo apt-get install mbw. As we found out later, this installed version 1.1.1 of this software (and yes this is important… keep reading). Running the software is easy. The simplest option is to just pass in an array size (in MB). For brevity I am only showing the average results, instead of all results.

$ mbw 32 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.00600  MiB: 32.00000   Copy: 5332.889 MiB/s
AVG     Method: DUMB    Elapsed: 0.00422   MiB: 32.00000   Copy: 7589.413 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.00164  MiB: 32.00000   Copy: 19465.904 MiB/s

Another useful option is to specify the size of the “block” to use in the MCBLOCK test. To specify this option you can use the -b flag.

$ mbw -b 4096 32 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.00589  MiB: 32.00000   Copy: 5428.421 MiB/s
AVG     Method: DUMB    Elapsed: 0.00421   MiB: 32.00000   Copy: 7598.062 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.00064  MiB: 32.00000   Copy: 50172.468 MiB/s

Woah! Hold the phone! Do you notice something about these results? Why is the MCBLOCK test a whole order of magnitude faster then the MEMCPY test? This made Stephen and my jaws drop. What was being done to get this much throughput? This is where the story really begins.

This first thing we did was grab a copy of the source. The first source that we found was for version 1.2.2 (At this time we didn’t know it was different then the version we had installed). We started digging through the source and found the worker function that performed the three tests.

if(type==1) { /* memcpy test */
  /* timer starts */
  gettimeofday(&starttime, NULL);
  memcpy(b, a, array_bytes);
  /* timer stops */
  gettimeofday(&endtime, NULL);
} 
else if(type==2) { /* memcpy block test */
  gettimeofday(&starttime, NULL);
  for(t=0; t<array_bytes; t+=block_size) {
      b=mempcpy(b, a, block_size);
  }
  if(t>array_bytes) {
      b=mempcpy(b, a, t-array_bytes);
  }
  gettimeofday(&endtime, NULL);
} 
else { /* dumb test */
  gettimeofday(&starttime, NULL);
  for(t=0; t<asize; t++) {
      b[t]=a[t];
  }
  gettimeofday(&endtime, NULL);
}

This is the code snippet from worker() in mbw.c on line 92. The first thing we discovered that that the MCBLOCK test was using the mempcpy() function. I had never used the mempcpy function before so I was intrigued! Of course the mystery only deepened when we looked at the mempcpy man page.

The mempcpy() function is nearly identical to the memcpy(3) function. It copies n bytes from the object beginning at src into the object pointed to by dest. But instead of returning the value of dest it returns a pointer to the byte following the last written byte.

mempcpy(3) man page

They really weren’t kidding with the “nearly identical” part either. As soon as I dug into the glibc source code, it became very apparent something strange was going on.

void *
__mempcpy (void *dest, const void *src, size_t len)
{
  return memcpy (dest, src, len) + len;
}
libc_hidden_def (__mempcpy)
weak_alias (__mempcpy, mempcpy)
libc_hidden_builtin_def (mempcpy)

So why was the mempcpy() code running so much faster than the memcpy() code if one is simply calling the other? The answer would soon surface! The next thing we did is compile the 1.2.2 source that we downloaded and ran it. To our amazement we were getting much lower bandwidth for what seemed like no reason.

$ ./mbw-1.2.2 -b 4096 8 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.00292  MiB: 8.00000    Copy: 2743.861 MiB/s
AVG     Method: DUMB    Elapsed: 0.00116  MiB: 8.00000    Copy: 6871.081 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.00098  MiB: 8.00000    Copy: 8145.810 MiB/s

We didn’t understand, we didn’t change anything, we simply compiled and executed the code, and yet the Ubuntu package version was reporting huge bandwidths, and this version was not. I started to suspect that the version in the repo was different somehow, and I was right! We ran apt-get source mbw, and sure enough we got version 1.1.1. Running a diff between these two files showed that the MCBLOCK test was updated.

/* in version 1.1.1 */
for(t=0; t<array_bytes; t+=block_size) {
   c=mempcpy(b,a,block_size); 
}

/* in version 1.2.2 */
for(t=0; t<array_bytes; t+=block_size) {
   b=mempcpy(b,a,block_size); 
}

Well, that solves that mystery! The issue was that, in version 1.1.1 (installed by apt-get) , the program was writing the same block_size chunk of memory over and over causing heavy cache hits and speed up. This new version properly advances the destination pointer, thus eliminating the cache hits and lowering the bandwidth measurements.

Now, does anything else stand out about the 1.2.2 code? Well if you guessed that the source pointer was not being advanced, you would be correct! So these numbers were still a bit off. After making the correction we got much more consistent measurements.

/* in the corrected version (now 1.3.0) */
char* aa = (char*)a;
char* bb = (char*)b;
gettimeofday(&starttime, NULL);
for (t=array_bytes; t >= block_size; t-=block_size, aa+=block_size){
   bb=mempcpy(bb, aa, block_size);
}
if(t) {
   bb=mempcpy(bb, aa, t);
}
gettimeofday(&endtime, NULL);
$ ./mbw-1.3.0 -b 4096 8 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.00288  MiB: 8.00000    Copy: 2778.067 MiB/s
AVG     Method: DUMB    Elapsed: 0.00113  MiB: 8.00000    Copy: 7107.952 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.00166  MiB: 8.00000    Copy: 4817.246 MiB/s

I am happy to report that these changes were merged into the main line release at the raas/mbw github page. So if you are going to use mbw for benchmarking your memory throughput I highly recommend you use the new 1.3.0 version.

If you have a multi-cpu system and want to see what your total average throughput is you can use this script below. It will detect how many processors you have and spawn the matching number of mbw instances. It will then sum up the average measurements. Feel free to modify it as needed.

#! /usr/bin/env bash
# This will run an mbw instance for each core on the machine

NUMCORES=$(grep "processor" /proc/cpuinfo | wc -l)
TMP="/tmp/mbw_result_tmp"

echo "Starting test on $NUMCORES cores"
for (( i=0; i<$NUMCORES; i++ )); do
   mbw -b 4096 32 -n 100 > ${TMP}_${i} & 
done

echo "Waiting for tests to finish"
wait

MEMCPY_RESULTS=()
DUMB_RESULTS=()
MCBLOCK_RESULTS=()

for (( i=0; i<$NUMCORES; i++ )); do
   MEMCPY_RESULTS[$i]=`grep -E "AVG.*MEMCPY" ${TMP}_${i} | \
      tr "[:blank:]" " " | cut -d " " -f 9`

   DUMB_RESULTS[$i]=`grep -E "AVG.*DUMB" ${TMP}_${i} | \
      tr "[:blank:]" " " | cut -d " " -f 9`

   MCBLOCK_RESULTS[$i]=`grep -E "AVG.*MCBLOCK" ${TMP}_${i} | \
      tr "[:blank:]" " " | cut -d " " -f 9`
done

MEMCPY_SUM=0
DUMB_SUM=0
MCBLOCK_SUM=0

# Need to use `bc` because of floating point numbers
for (( i=0; i<$NUMCORES; i++ )); do
   MEMCPY_SUM=`echo "$MEMCPY_SUM + ${MEMCPY_RESULTS[$i]}" | bc -q`
   DUMB_SUM=`echo "$DUMB_SUM + ${DUMB_RESULTS[$i]}" | bc -q`
   MCBLOCK_SUM=`echo "$MCBLOCK_SUM + ${MCBLOCK_RESULTS[$i]}" | bc -q`
done

echo "MEMCPY Total AVG: $MEMCPY_SUM MiB/s"
echo "DUMB Total AVG: $DUMB_SUM MiB/s"
echo "MCBLOCK Total AVG: $MCBLOCK_SUM MiB/s"

Using the 1.3.0 version of mbw, as well as some good old fashion detective work, we were able to find the perfect combination of software and hardware optimizations to push our product to the next level. It is still in early beta, but hopefully in a few months it will be finalized and I can release more details!


“Graphical user interfaces from a bash script using Zenity - Part 2”

2013-09-22

In Part 1 of using Zenity we covered some pretty cool GUI components that can be used from a shell script. This time we will be finishing up with some examples of how to integrate some more complex gtk+ widgets into your script.

One of the most fundamental outputs any long running script should produce is some form of progress. Everyone loves a well tuned, accurate progress bar! Conversely, everyone hates a progress bar that goes to 99% immediately and sits there for 10 minutes. Getting a progress bar perfect is an art in and of itself, but that’s not really the point of this tutorial. I’m just going to show you how a Zenity progress bar works.

The Zenity progress bar is activated through the --progress flag. The progress bar is different from the other components because it actively listens on standard in for progress and text to display. Any text you want displayed above the progress bar must begin with a ’#’ character. Any number that is written to standard out is assumed to be the progress. As you can see in the example below, before I copy a file, I echo the file name with a ’#’ character before it. Then I perform the copy operation, and update the percentage. The output of the entire for loop is piped to the progress bar.

#! /usr/bin/env bash

##
# This script will copy all of the specified files to the destination
# usage ./copy.sh file1 file2 ... fileN destination/
#

numberOfArgs=$#

if [ $numberOfArgs -le 0 ]; then
    echo "Usage: ./copy.sh file1 file2 ... fileN destination/" >&2
    exit 1
elif [ $numberOfArgs -le 1 ]; then
    echo "You must specify a destination" >&2
    exit 1
fi

destination=${@: -1}

for (( i=1; i<$numberOfArgs; i++ )); do
    echo "# ${1}"
    cp "${1}" "$destination"
    echo "$(( (i * 100) /numberOfArgs ))"
    sleep 0.5  #So that you can see the items being copied
    shift 1
done | zenity --progress --title="Copy files to $destination" --percentage=0

if [ "$?" -eq 1 ]; then
    zenity --error --text="Copy Aborted"
    exit 1
fi

The Zenity progress Bar

If you are unsure how long an operation will take, but want to display text on a progress bar you can use the --pulsate option. The pulsate option will just move a bar back and forth until end-of-file is read from standard in. To see it’s effect on your distribution run this simple command. (on Unity it just sits empty. How boring!)

$ ( sleep 5; echo "# done" ) | zenity --progress --pulsate \
   --title="Long operation" --text="Doing lots of stuff"

Another more advanced GUI widget that Zenity has to offer is the list dialog. The list dialog is activated by the --list flag and has quite a few options to control it’s behavior. To specify the names of the columns you can use the --column="name" flag as many times as you need. Each flag will add another column to the list display. Once you have all of the columns set up you can start adding content. To do this just add space separated values.

$ zenity --list --title="Shopping list" --column="Items" --column="Quantity" \
   "Bread" 1 \
   "Chicken" 2 \
   "Iced tea mix" 1 \
   "Apples" 6

Zenity list dialog

Lists are great for displaying two dimensional data, like a database table. In this example I will read the data from a sqlite3 database called test.db that has a single table called memos. I then use Zenity to display the data in a nice tabular form.

#! /usr/bin/env bash
#Read from a sqlite3 database and build a zenity list
#to represent the table

command='--list --title="Database memos" '
header=`sqlite -header -list test.db 'select * from memos;' | head -n 1`

IFS='|' read -a columns <<< "$header"

for col in ${columns[@]}; do
   command="$command --column=\"$col\" "
done

command="$command $(sqlite -csv test.db 'select * from memos;' | tr ',' ' ') "

echo "$command" | tr '\n' ' ' | xargs zenity

sqlite3 zenity list dialog

Each row in the table can also be made editable with the --editable flag. By doing this, the changed rows are sent back to standard out when the okay button is pressed. Although this may seem like a great feature, it is harder to work with in practice. Zenity only prints the fist column of the changed row by default after you hit okay. This can make it hard for a script to know what was actually edited. In order to figure out which row is being edited I suggest using some sort of key in the first column, and to use the --print-column=ALL flag to get the entire row. However if the user edits the key column of a row, you’re going to have a bad time!

The final Zenity widget I am going to cover is the scale slider. Scale sliders are great for getting a numeric value from an arbitrary range of values. The scale can be used with the --scale flag. You can set the initial value, minimum value, maximum value, and step size with the --value, --min-value, --max-value, and --step respectively. The default values for each of those flags respectively are 0, 0, 100, and 1. You can also enable the scale to output the value to standard out on every change instead of just the final value using the --print-partial flag.

$ zenity --scale --min-value=0 --max-value=255 --text="Please select red saturation"

Zenity scale

This completes our tour of Zenity. As you can see Zenity provides a great utility for making bash scripts more approachable to users who might not be aware of, or comfortable with using the command line. Feel free to try, or modify any of the examples I have given and build a nice GUI application with a bash script.


“The Hand of Thief Trojan”

2013-08-29

Recently there was an announcement about a trojan program that targets Linux machines. It is a pretty impressive piece of software! It has the ability to detect if it is running in a virtual environment, or a chroot environment. The main payload of the trojan is a browser form grabber (thread bbb), and a connection to a command and control server (thread aaa).

Another cool feature is the ability to detect if any monitoring software is being used and stop its outgoing internet traffic. It looks for wireshark and tcpdump. All in all this is a pretty nice piece of malware! You can read more about it on this blog.

So the next thought might be, how do I know if I have this? Well, I got you covered! The sha256 sums for each of the components have been released. Also this trojan replaces the kernel flush-8:0 daemon with an infected version that can be detected. So, I have prepared a script that will scan for the various files, and bad flush-8:0 process.

#! /usr/bin/env bash

#Copyright (c) 2013 James Slocum

#Permission is hereby granted, free of charge, to any person obtaining a copy
#of this software and associated documentation files (the "Software"), to deal
#in the Software without restriction, including without limitation the rights
#to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
#copies of the Software, and to permit persons to whom the Software is
#furnished to do so, subject to the following conditions:

#The above copyright notice and this permission notice shall be included in
#all copies or substantial portions of the Software.

#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
#THE SOFTWARE.

# This script is designed to scan all executables on the system and look
# for the hand of thief trojan by checking the published sha256 checksums
# as well as look for the startup file it creates

exec 3>&1
exec 1>output.log
exec 2>error.log

# HOT initial binary, HOT shared object, HOT backdoor executable, HOT formgrabber
# HOT backconnect script
hashes=("BD92CE74844B1DDFDD1B61EAC86ABE7140D38EEDF9C1B06FB7FBF446F6830391"
        "2ACF2BC72A2095A29BB4C02E3CD95D12E3B4F59D2E7391D9BCBBA9F3142B40AE"
        "753DC7CD036BDBAC772A90FB3478B3CCF22BEC70EE4BD2F55DEC2041E9482017"
        "B794CE9E7291FE822B0E1F1804BD5A9A2EFC304A1E2870699C60EF5083C7BAC2"
        "4B0CC15B24E38EC14E6D044583992626DD8C72A4255B9614BE46B1B4EEFA41D7")

badfilecount=0

sha256() {
   sha256sum "$1" | cut -d " " -f 1 | tr '[:lower:]' '[:upper:]'
}

##
# A known version of HOT starts a fake version of flush-8:0 from init instead of
# the kernel process [kthread]. This will detect that.
# https://www.circl.lu/pub/tr-15/
scan_for_hand_of_thief_process() {
   if [[ $(ps -eaf | grep "\[flush" | tr -s " "  | cut -d " " -f 3 | grep ^1$) ]]; then 
      echo "!!! Infection suspected !!!" >&3
   else 
      echo "No infection suspected" >&3
   fi
}

scan_file(){
   local filename="$1"
   local sha256=$(sha256 "$filename")
   
   for hash in ${hashes[@]}; do
      if [ "$sha256" = "$hash" ]; then
         printf "$filename !!! FAILED !!!\n" >&3
         return 1
      fi
   done
   return 0
}

scan() {
   local dirname="$1"
   local result=0 
   for file in `ls -A1`; do
      if [ -d "$file" -a ! -L "$file" ]; then
         pushd "$file"
         scan "${dirname}${file}/"
         popd
      elif [ -x "$file" -a ! -d "$file" ]; then
         scan_file "$file"
         result=$?
      elif [[ "$file" =~ .*\.so.* ]]; then
         scan_file "$file"
         result=$?
      else
         result=0
      fi

      if [ $result -eq 1 ]; then
         (( badfilecount++ ))
      elif [ $result -eq 2 ]; then
         (( suspectfilecount++ ))
      fi
   done
}

if [ $# -lt 1 ]; then
   echo "Please specify a directory or file to scan." >&3
   exit 1
fi

echo "Scanning for \"Hand of thief\" trojan" >&3
scan_for_hand_of_thief_process
pushd "$1"
echo "Deep scanning files" >&3
scan
popd
echo -e "\nScan complete:" >&3
echo "Infected files: $badfilecount" >&3
echo "" >&3

Please feel free to use this script to detect if you have been infected by the trojan. Of course it is important to note that this is a trojan and not a worm. The difference is that you would have had to run the “dropper” application yourself with root permissions. So if you haven’t run any mystery software with the sudo command lately, then chances are you don’t have an infection.

$ ./scanner.sh /usr/bin
Scanning for "Hand of thief" trojan
No infection suspected
Deep scanning files

Scan complete:
Infected files: 0

Please let me know in the comments if you find any issues with the script and I will be happy to update it! Until then, stay safe and don’t run any mystery executables with root permissions.


“Graphical user interfaces from a bash script using Zenity - Part 1”

2013-07-17

As any developer knows, reading and writing to the terminal (via stdin and stdout) are basic skills required to perform even the most fundamental programming task. Every language has their own functions (or methods) to print to, and read text from the terminal. In bash, the fundamental ways to do this are with the echo and read commands.

#! /usr/bin/env bash

random="$(( (RANDOM % 16) + 1 ))"
guess="0"

echo "Guess a number between 1 - 16"
while [ $guess -ne $random ]; do
   read -p "Guess > " guess
   
   if [ $guess -lt $random ]; then
      echo "Too low!"
   elif [ $guess -gt $random ]; then
      echo "Too high!"
   fi
done

echo "Correct!"

This simple example shows the use of both the echo and read commands. But what if we want to interact with a user that might not be familiar with the command line? For example both my wife and her sister have Linux computers, but neither know what the command line is. Many Apple users are in the same boat. Just because the resource is there, does not mean people will use it. So how do we write scripts that non-cli users can access?

In comes zenity! Zenity is a command line tool that will display gtk+ dialogs. There is a huge list of default dialogs available from calendars to progress bars. Zenity is available on Linux, Windows, and BSD Unix. OSX does support the GTK+ library, but I did not find a port of Zenity in the brew repository. Perhaps this would be a good project down the road…

Lets begin this tour with some simple dialogs that can be used in the place of echo. Zenity has four main dialog types, error, info, question, and warning. Each dialog can take a --text switch to set what the dialog will say, --title to set the title of the window, --width, and --height to control the window size.

zenity --info --title "info dialog" --text='Something needs your attention!'
zenity --error --title "error dialog" --text='Something went wrong!'
zenity --warning --title "warning dialog" --text='Something might be wrong!'
zenity --question --title "question dialog" --text='Are you going to take action?'

The four zenity dialogs

As you can see, it is very simple to get GUI (graphical user interface) dialogs up and running quickly from the command line. What happens if you need to a user to select a date? Zenity has you covered with the --calendar flag. If you want to set a specific initial date to show you can use the --year --month and --day flags. When the user has selected the date and hit the okay button, the date is returned on standard out in the form MM/DD/YYYY. If you want to change this format, you can use the --date-format flag and pass in any strftime() style string. See the strftime reference page for details.

$ zenity --calendar --year 1969 --month 12 --day 28 --text "Linus Torvalds birthday"
12/28/1969

$ zenity --calendar --text "Please select your birthday" --date-format="%A, %B %d %G"
Tuesday, June 25 2013

Zenity calendar prompt

At this point you might be wondering, “How can I tell if a user presses cancel?”. It’s easy! You simply need to check the return code (echo $?). A 0 return code means the user selected okay. This is usually accompanied by what ever info you were trying to get from the user. A return code of 1 indicates the user pressed cancel, and a return code of 5 indicates a timeout has occurred. To set a timeout on a dialog you can use the --timeout flag.

zenity --question --text "Hey, you there?" --timeout=5

Have you ever needed a user to select a color from a shell script? Yeah… me neither, but with Zenity it is easy if the need ever arises!

$ zenity --color-selection --show-palette
#4d4db4b45757

Important: The color code returned from Zenity is a gtk+ six byte color code. Each 2 byte pair represent an intensity value from 0 - 65535. HTML color codes are three bytes and each byte represents an intensity value from 0 - 255. So to convert these numbers you will need to scale the values.

Zenity color palette

While a developer may never need the user to select a color from a shell script, they most certanly will need a user to select a file at some point. Zenity has a really nice file selector dialog that can be used.

zenity --file-selection

There are a lot of options to help you control how many and what types of files are selectable. Using the --multiple option will allow the user to select as many files as they want. The --directory option limits the user to only selecting directories. If you only want the user to select a specific kind of file, like an image, you can specify a file filter using --file-filter. This can also be used as a save dialog using the --save flag.

# Save dialog limiting the user to only saving 'gif' files
$ zenity --file-selection --file-filter=*.gif --save

# Allow selection of multiple image files limited to 'gif', 'jpg',
# and 'jpeg' extensions
$ zenity --file-selection --multiple --file-filter='*.gif *.jpeg *.jpg'

# Allow selection of only one type of file at a time
$ zenity --file-selection --file-filter="*.gif" \
   --file-filter="*.jpg" \
   --file-filter="*.jpeg"

Zenity file chooser

Multi-entry forms are also easy to build using Zenity. A form can be built using text fields, password fields, and calendars. Text fields can be added with the --add-entry flag. Password fields are added with the --add-password flag, and calandars are added with the --add-calendar flag.

$ zenity --forms --title="Create user" --text="Add new user" \
   --add-entry="First Name" \
   --add-entry="Last Name" \
   --add-entry="Username" \
   --add-password="Password" \
   --add-password="Confirm Password" \
   --add-calendar="Expires"

Zenity form

This concludes part 1 of our tour, Next time I will talk about displaying tabular data with the --list option, and showing the progress of a long running task with a GUI progress bar using --progress


“A tour of the lesser known coreutils - Part 3”

2013-06-14

This is the final stop on our tour of the lesser known coreutils. If you haven’t already done so, you should read part 1 and part 2 first. Once again let’s dive in with some cool commands!

mktemp

mktemp will create a temporary file or directory with a unique name. You can also pass in a template and it will safely create a file with a unique name matching it. The template must contain at least 3 ‘X’ characters, but can contain more. The more X characters you use, the more random characters will appear in the name. If you just run the command with no arguments it will use /tmp as the default location and the template tmp.XXXXXXXXXX. If you specify only a template, the file will be created in the current working directory.

$ mktemp
/tmp/tmp.FDxW98zfqW

$ mktemp sometemp.XXXXXXXXXX
sometemp.nlk4ihtGic

$ mktemp --tmpdir=/tmp mytemp.XXXXXXXX
/tmp/mytemp.ys6mtKKT

As you can see, the command will output the name of the of the temporary file it has just created. You can easily capture this in a shell script using a variable and a command substitution.

$ TEMPDIR="$(mktemp)"
$ echo $TEMPDIR
/tmp/tmp.eMspLL5dWc

$ echo "Hello" > $TEMPDIR
$ cat $TEMPDIR
Hello

Combining mktemp with unlink can create a way to write data to a file without anyone being able to read it externally. The unlink command will remove the file name from the file system, but if there are still open streams the data on disk remains intact until those streams are closed. The key is to use the exec command to open file descriptors (streams) to the file before unlinking it.

$ TEMPDIR="$(mktemp --tmpdir=/tmp mytemp.XXXXXXXX)"
$ exec 5>$TEMPDIR
$ exec 6<$TEMPDIR
$ echo "Hello to file descriptor" >&5
$ cat $TEMPDIR
Hello to file descriptor

$ unlink $TEMPDIR
$ cat $TEMPDIR
cat: /tmp/mytemp.M1kjasnR: No such file or directory

$ echo "Hello to file descriptor after unlink" >&5
$ cat <&6
Hello to file descriptor after unlink

After you unlink the file, the two descriptors becomes like a named pipe. Anything you write to fd 5 will be stored until it is read from fd 6. If you try to cat from fd 6 twice in a row, the second read will not return anything, as it was consumed by the first read. This is the best way to store “secret” data in a script without worrying about nosy users peeking at the file.

touch

touch is a command that I am sure most Linux and Unix users are aware of. When used with a file name, it will create an empty file if the file does not exist. If the file does exist, it will update the last accessed and last modified timestamps.

$ ls -la
-rw-rw-r--  1 james james   897 Dec 17 14:49 dart_blog_part_4.html

$ touch dart_blog_part_4.html
$ ls -la
-rw-rw-r-- 1 james james 897 Jun 14 07:32 dart_blog_part_4.html

The reason I bring this up is because of the lesser known options that let you mess with the timestamps of the file. You can change a file to have any timestamp you want, in the future or the past.

# use -t option to set a specific date
$ touch -t 400006091148.12 dart_blog_part_4.html
-rw-rw-r--  1 james james   897 Jun  9  4000 dart_blog_part_4.html

# use -d or --date to set a specific date with a date string
$ touch -d "next tuesday" dart_blog_part_4.html
-rw-rw-r--  1 james james   897 Jun 18  2013 dart_blog_part_4.html

# use the timestamp of a different file
$ ls -la
-rw-rw-r--  1 james james   897 Jun 18  2013 dart_blog_part_4.html
-rw-rw-r--  1 james james   661 Jan 10 14:00 linux_journal_blog.html

$ touch -r linux_journal_blog.html dart_blog_part_4.html
-rw-rw-r--  1 james james   897 Jan 10 14:00 dart_blog_part_4.html
-rw-rw-r--  1 james james   661 Jan 10 14:00 linux_journal_blog.html

ptx

ptx is another one of those interesting tools that I would catalog with tsort (See part 2). It is a very specialized command. It will produce a permuted index, including context, of the words in an input file. Of course it is better shown then described.

$ cat test.txt
Roses are red
Violets are blue
Chocolate is sweet
and so are you

$ ptx test.txt
   Roses are red Violets are blue   Chocolate is sweet and so are you
Chocolate is sweet and so/          Roses are red Violets are blue
sweet and so/       Roses are red   Violets are blue Chocolate is
      are blue Chocolate is sweet   and so are you      /are red Violets
is sweet and so are/        Roses   are red Violets are blue Chocolate
are/        Roses are red Violets   are blue Chocolate is sweet and so
   blue Chocolate is sweet and so   are you                 /Violets are
        Roses are red Violets are   blue Chocolate is sweet and so are/
   red Violets are blue Chocolate   is sweet and so are you         /are
sweet and so are/       Roses are   red Violets are blue Chocolate is
  are blue Chocolate is sweet and   so are you              /red Violets
    Violets are blue Chocolate is   sweet and so are you        /are red
    Chocolate is sweet and so are   you                        /are blue

Now it should be a bit more clear. The words to the right of the break in the center are sorted in alphabetical order. Each word is printed the same number of times it appears in the file, and each word is shown with its surrounding context. There are some options to change how the output is displayed, or to isolate specific words.

# ignore case when sorting, and break on end of line
$ ptx --ignore-case -S "\n" test.txt
                                 and so are you
                         Roses   are red
                       Violets   are blue
                        and so   are you
                   Violets are   blue
                                 Chocolate is sweet
                     Chocolate   is sweet
                     Roses are   red
                                 Roses are red
                           and   so are you
                  Chocolate is   sweet
                                 Violets are blue
                    and so are   you

# same as above, and print only those lines with "are" in them
$ ptx --ignore-case -S "\n" -W "are" test.txt
                         Roses   are red
                       Violets   are blue
                        and so   are you

pr

pr can be used to paginate or columnate a text file to prepare it for printing. Basically it will read a plain text file in and add a header, footer, and page breaks. It can also be used to format the plain text file like a magazine article (2 column style). Honestly it does do a nice job, however the 2 column style is a bit quirky.

# Show the first 10 lines of the plain text file
$ head -10 dart.txt
Dart - A new web programming experience
James Slocum

[Section: Introduction]
JavaScript has had a long standing monopoly on client side web programming.
It has a tremendously large user base and countless libraries have been 
written in it.  Surely it is the perfect language with no flaws at all! 
Unfortunately, this is simply not the case. JavaScript is not without its 
problems, and there exists a large number of libraries and "trans-pilers" 
that attempt to work around JavaScripts more quirky behaviors. JQuery, 

# Format it with pr, and show the first 10 lines
$ pr -h "Dart - A new web programming experience" dart.txt | head -10


2013-02-28 10:19     Dart - A new web programming experience      Page 1


Dart - A new web programming experience
James Slocum

[Section: Introduction]
JavaScript has had a long standing monopoly on client side web programming.

As you can see in this example, pr has added the requested header info, as well as page numbers and a date. There are a ton of flags to control different header, footer, and page formatting options. One of the strange things about the columnate -COLUMN --columns=NUM behavior is that it truncates the lines instead of reformatting them.

# You can see that it just chops off the end of the line
# instead of wrapping it
$ pr -2 -h "Dart - A new web programming experience" dart.txt | head -10


2013-02-28 10:19     Dart - A new web programming experience      Page 1


Dart - A new web programming experi defined with the class keyword. Eve
James Slocum                        and all classes descend from the Ob
                                    single inheritance. The extends key
[Section: Introduction]             other than Object. Abstract classes
JavaScript has had a long standing  some default implementation. They c

It’s a good thing that we already have a tool to help us with this! We can combine the fmt utility and pr to generate cleaner 2 column text. We simply need to restrict the original text to 35 characters per line.

$ fmt -w 35 dart.txt | pr -2 -h "Dart - A new ... " | head -10


2013-06-12 10:14     Dart - A new web programming experience      Page 1


Dart - A new web programming        going to need to grab a copy
experience James Slocum             from dartlang.org. I personally
                                    chose to install only the SDK;
[Section: Introduction] JavaScript  however, there is an option to
has had a long standing monopoly    grab the full Dart integrated

Now we can clearly see that the columns are wrapping correctly. Now I know most of you might be thinking “Who writes with just plain text? Why is this still a thing?”. Well actually I still write a lot in plain text! Vim is my main editor for everything, not just code. So it makes sense to do all of my writing in plain text. If I need to format the text for publications, it is easy to use tools like LaTeX (pronounced lay-tek) or simply copy and paste it into OpenOffice. In fact plain text is the only format pretty much guaranteed to exist and be supported for as long as there are computers.

nl

nl prints out the contents of a file with the lines numbered. cat --number does the same thing, but nl has a ton more options. With nl you can control the style and format of the output. Want to only number lines that match a basic regex? No problem with the -b p<regex> flag.

# number all lines (including blanks)
$ nl -b a test.txt
     1  Roses are red
     2  Violets are blue
     3  Chocolate is sweet
     4  and so are you
     5
     6  Roses are the prettiest flower
     7  but I like potatoes more
     8  they will grow eyes
     9  even in a dark drawer

# number only those lines with the word Roses
$ nl -b pRoses test.txt
     1  Roses are red
       Violets are blue
       Chocolate is sweet
       and so are you

     2  Roses are the prettiest flower
       but I like potatoes more
       they will grow eyes
       even in a dark drawer

You will have to excuse the bad poetry, as I just needed some simple test content. As you can see the nl utility has a lot of great options to help number the lines in a file.

split, csplit

split can be used to either split a text file into several files based on lines, or a binary file into pieces based on size. This is a great tool for working around the size limitations on email attachments. The two most useful flags for working with binary files are the -b --bytes flags which allows you to specify the size of each partial file, and the -n --number flags which allow you to specify how many partial files you want. You only use one or the other.

$ du -h VIDEO.ts
107M    VIDEO.ts

$ split -b 10M VIDEO.ts VIDEO_part_
$ du -h *
10M     VIDEO_part_aa
10M     VIDEO_part_ab
10M     VIDEO_part_ac
10M     VIDEO_part_ad
10M     VIDEO_part_ae
10M     VIDEO_part_af
10M     VIDEO_part_ag
10M     VIDEO_part_ah
10M     VIDEO_part_ai
10M     VIDEO_part_aj
6.3M    VIDEO_part_ak

To reassemble the files, simply cat them back together and redirect the output to the file name you want.

csplit works a bit differently, csplit is used to split a file based on a “context line.” The context line can be specified as regular expression pattern.

# This will copy up to but not including the lines with 'Roses'
$ csplit test.txt /Roses/ {*}
0
66
99

$ ls
xx00  
xx01 
xx02
test.txt

$ cat xx00

$ cat xx01
Roses are red
Violets are blue
Chocolate is sweet
and so are you

$ cat xx02
Roses are the prettiest flower
but I like potatoes more
they will grow eyes
even in a dark drawer

The 'xx’ prefix is configurable with the -f --prefix=PREFIX flags. So if you want to give each part a more usable name, it’s not a problem. The suffix (the numbers in this case) is also settable with the -b --suffix-format=FORMAT flags. Any printf() format can be passed in. See this page for more details on that.

seq

seq will generate a sequence of numbers and print them to standard out. There are three ways to execute this command. The first is just with the final number N. This will produce a sequence from 1-N. The next way is to provide a start S and final N. This will produce a sequence S-N. The last way is to provide a start S, final N, and an increment I. This will produce a sequence S-N, incrementing by I each time.

$ seq 5
1
2
3
4
5

$ seq 0 5
0
1
2
3
4
5

$ seq 0 2 5
0
2
4

seq is rarely used directly and is instead used as part of a command chain or a shell script loop.

# use seq as a counter in a loop to find prime numbers
for i in $(seq 0 15); do
   if [ $(factor $i | cut -d : -f 2 | wc -w) -eq 1 ]; then
      echo "$i is prime"
   fi
done

# Use seq and shuf to generate a pick 6 quick pick
$ seq 1 49 | shuf | head -6 | sort -n | xargs printf "%d,%d,%d,%d,%d,%d\n"
2,18,28,33,43,48

timeout

timeout is an awesome command when you need to put an upper limit on the amount of time a program should/can run. You can tell if a command timed out by the return value. A return of 124 indicates that the command timed out. 125 is returned if timeout fails, 126 is if the command cannot be run, and 127 if the command cannot be found. If the command runs successfully it will return the normal command exit status. You can check the return status with the command echo $?.

# cat will wait forever for input when invoked without a file
# use timeout force cat to only run for 5 seconds
$ timeout 5 cat

# This is useful in scripts if you want data from the Internet
# but don't want to hang forever
$ timeout 5 curl http://jamesslocum.com

timeout also has the -s --signal flags to allow you to control what signal gets sent to the process. The default signal is 15 or TERM. You can change it to 9 or KILL to guarantee a process dies and does not catch or ignore the signal.

Links and further reading

This concludes my tour of the lesser known coreutils. Of course there are dozens of more “well known” coreutils that you can read up on. I highly recommend reading the man pages and info pages as well.

Wikipedia coreutils page
GNU coreutils Documentation
Bash Cookbook (Amazon link)