blog


:(){ :|:& } ;:

The Inner Thoughts of a C Developer.

(25 Posts)


“A tour of the lesser known coreutils - part 2”

2013-06-06

In part 1 I introduced 6 of the lesser known core utilities. One thing I forgot to mention is that OSX prefixes all of the coreutil commands with the letter ‘g’. So head is ghead on a Mac. With that out of the way let’s keep the momentum going and dive right in with more commands!

factor

factor will perform a prime number factorization on the input number. You can either specify the number you want to factor on the command line after the factor command, or you can just execute the command and keep typing numbers onto the command line. To exit, just hit CTRL+d.

$ factor 5234
5234: 2 2617

$ factor
52893
52893: 3 3 3 3 653
1234567891234567891234567898
factor: `1234567891234567891234567898' is too large

factor can be built with or without the gmp library. gmp is the GNU Multiple Precision library. It is a “big number” library that can handle precisions greater than what the native processor can handle (in my case 64 bit). The default version that comes with Ubuntu Linux is not built with gmp, and therefore limited in the size of the number it can decompose.

To get around this limitation, you can install the gmp library (either from the apt archive, or from source) and recompile the coreutils package from source. Once you have re-compiled it you can replace the old factor with the new one, or name it something else and put it side by side with the old one.

$ echo "2^128-1" | bc | factor
factor: `340282366920938463463374607431768211455' is too large

# I recompiled the coreutils with gmp, and created big-factor
$ echo "2^128-1" | bc | big-factor
340282366920938463463374607431768211455: 3 5 17 257 641 65537 274177 6700417 67280421310721

On FreeBSD, when you install the coreutils you will be prompted to link against the gmp libraries. If you choose that option it will install the library for you, then build the coreutils with the proper linkage.

Fedora Linux users are lucky, as the default version of factor is built with gmp support.

If you are using OSX, remember that all of the coreutils commands installed by brew are prefixed with a letter 'g’. You can either run gfactor, or create an alias.

alias factor=/usr/local/bin/gfactor

To make this permanent you should add this alias to your .bash_profile. Also, the default brew script builds coreutils without gmp support. To remedy this you must edit the script and change the line that says --without-gmp to --with-gmp. You must also have gmp installed. Just follow these steps.

$ brew install gmp
$ brew edit coreutils

#edit the line that says --without-gmp to --with-gmp (line 14 for me)
$ brew install coreutils

#alternatively you could use 'brew reinstall coreutils'

#confirm that gfactor is linked to gmp
$ otool -L /usr/local/bin/gfactor
/usr/local/bin/gfactor:
   /usr/local/lib/libgmp.10.dylib
   /usr/lib/libiconv.2.dylib
   /usr/lib/libSystem.B.dylib

base64

base64 is a simple, but very useful program that allows you to encode files into base64 text, and decode base64 text back into files. By default, it will print the encoded base64 text to standard out, so you will need to redirect it to a file if you want to save it.

base64 hyphen.jpg
/9j/4AAQSkZJRgABAgEBkAGQAAD/4QBoRXhpZgAATU0AKgAAAAgABQESAAMAAAABAAEAAAEaAAUA
AAABAAAASgEbAAUAAAABAAAAUgEoAAMAAAABAAIAAIdpAAQAAAABAAAAWgAAAAAAAAGQAAAAAQAA
AZAAAAABAAAAAAAA/+0UylBob3Rvc2hvcCAzLjAAOEJJTQPtClJlc29sdXRpb24AAAAAEAGQAAAA
AQABAZAAAAABAAE4QklNBA0YRlggR2xvYmFsIExpZ2h0aW5nIEFuZ2xlAAAAAAQAAAAeOEJJTQQZ
EkZYIEdsb2JhbCBBbHRpdHVkZQAAAAAEAAAAHjhCSU0D8wtQcmludCBGbGFncwAAAAkAAAAAAAAA
AAEAOEJJTQQKDkNvcHlyaWdodCBGbGFnAAAAAAEAADhCSU0nEBRKYXBhbmVzZSBQcmludCBGbGFn
cwAAAAAKAAEAAAAAAAAAAjhCSU0D9RdDb2xvciBIYWxmdG9uZSBTZXR0aW5ncwAAAEgAL2ZmAAEA
...
...
kzT4Htaf89P+acVU4Y/yb/SlqYJYxfiRfqwQT7vX4eicOuKvTcVf/9k=

Using base64 can be a clever way to get around attachment limitations placed on email, or to embed a program or other files into a script. I have used this method in my packup.sh script that can be found on my github page.

On FreeBSD you must install base64 separately from the coreutils. The base64 package is in /usr/ports/converters/base64. You can install the base64 package from there by running make install as root.

On OSX the base64 package comes packaged with the openssl utilities. Although the binary is different from the GNU version, it still functions mostly the same. The difference is the flags that the command takes. The GNU version uses the ’-d’ flag for decoding, while the openssl version uses ’-D’. The GNU version also has a ’-i’ flag to ignore newline characters, the openssl version does not. To use the GNU version of base64 you can run the gbase64 command.

truncate

truncate can be used to adjust the size of a file. If you make a file smaller, it will chop the data off the end, and if you make it bigger, it will produce a hole. A hole is a null byte filled section of a file that does not actually get stored on the disk. It is transparently stored as meta-data to save space in the file system.

# Create and list size of empty file
$ touch empty.file
$ ls -las empty.file
0 -rw-rw-r-- 1 james james 0 Apr 19 21:03 empty.file

# set the size to 100 Megs and list it again
$ truncate -s 100M empty.file
$ ls -las empty.file
0 -rw-rw-r-- 1 james james 104857600 Apr 19 21:04 empty.file

Notice above that the reported size of the file has increased to 104857600 bytes, but the number of blocks (the first number) is still 0. Thats because this file still takes no space on the physical storage medium, despite reporting it’s size as 100 Megs. The file is just one big hole.

If you are using OSX, you can run the truncate program by running the gtruncate command.

tsort

tsort is one of the stranger core utilities because it is so specialized. tsort will perform a topological sort (or topsort) on it’s input. A topological sort is used on directed graphs. For every directed edge uv from vertex u to vertex v, u comes before v in the ordering. One use of this algorithm is to determine the order tasks should be performed to avoid any conflicts. Lets take a look at a simple example.

daily tasks graph

now, we can enter each uv directed edge into tsort, and it will output the order in which to perform these tasks.

$ tsort <<-EOF 
> take_a_shower make_shopping_list > take_a_shower go_to_bank > take_a_shower get_hair_cut > go_to_bank go_to_store > go_to_store buy_food > buy_food cook_dinner > go_to_bank get_car_wash > go_to_bank get_hair_cut > make_shopping_list buy_food > EOF take_a_shower go_to_bank make_shopping_list get_hair_cut get_car_wash go_to_store buy_food cook_dinner

OSX comes with its own version of tsort. The GNU version and the default version behave identically so you can use either. If you want to use the GNU version from the coreutils package, use the gtsort command.

Next time I will wrap up the series with a few more useful commands. Ever need to make it look like a file was created in the year 4000? Check back for part 3 to find out how!


“A tour of the lesser known coreutils - part 1”

2013-05-18

The GNU coreutils are a set of tools that come with every flavor of Linux. They are a useful and historical part of the Unix and Linux operating systems and you use them everyday without realizing. Commands like ls, cat, and rm are all part of the coreutils package. However there are a large number of lesser known and lesser used tools in the coreutils. I will go over some of them that I have found very useful.

The GNU coreutils are not just available for Linux. You can install them on OSX, BSD Unix, and Windows as well. In OSX, you can use brew install coreutils to install it (you must have homebrew installed). On windows you can either install cygwin , or gnuwin32. On FreeBSD you will need to have the ports collection installed. The coreutils port is in /usr/ports/sysutils/coreutils. Of course for the really brave you can always compile it from source.

Once you have the coreutils installed you can begin experimenting with some of the new commands and see what they do. The complete documentation of all the coretuils can be found at the GNU documentation page. Average Linux and Unix users will find they know most of the commands, and use them on a daily basis. However there are quite a few gems that most users probably didn’t know were there. Let’s take a look at a few.

Note: These examples are geared toward the GNU versions available on Linux. Other versions of these commands might have different options or slight incompatibilities.

expand, unexpand

expand and unexpand are a fantastic pair of tools if you ever work on code with another developer that uses a different spacing in their editor than you (like tabs instead of 2 spaces). expand will read a file and convert tab characters to the requested number of spaces. This is great for a language like Python that is dependent on a consistent spacing being used or it won’t run. unexpand does the opposite and converts the specified number of spaces into a single tab character.

# you can see this text has 6 spaces in the front and back
# of the text.

$ xxd test.txt
0000000: 2020 2020 2020 5468 6973 2069 7320 736f        This is so
0000010: 6d65 2074 6578 742e 2020 2020 2020 0a0a  me text.      ..

# convert 3 spaces to a tab
$ unexpand -t 3 test.txt | xxd
0000000: 0909 5468 6973 2069 7320 736f 6d65 2074  ..This is some t
0000010: 6578 742e 0909 0a0a                      ext.....

fold, fmt

fold, and fmt are both text formatters that will help you keep text to a specific number of columns. fold is a bit more of a hammer then fmt is because it will hard wrap lines at the specified number of columns. By hard wrap I mean it will cut whole words (unless the -s flag is specified). fmt is more intelligent, and softer with your text. Not only does it wrap lines on whole words, but it also re-spaces the lines so they don’t have a ragged edge appearance.

$ cat test.txt
This is line 1, it is 41 characters long.
This is line 2, it's shorter.
Line 3.
This is another long line that will need to be wrapped.

# Notice the ragged edges, as the original newlines are kept
$ fold -s -w 20 test.txt
This is line 1, it
is 41 characters
long.
This is line 2,
it's shorter.
Line 3.
This is another
long line that will
need to be wrapped.

# This command ignores the original newlines and wraps the text nicely
$ fmt -w 20 test.txt
This is line 1,
it is 41 characters
long.  This is line
2, it's shorter.
Line 3.  This is
another long line
that will need to
be wrapped.

shuf

shuf is a handy tool that will produce random permutations of whatever input you give it.

# Play a shuffled mp3 playlist from the current directory
for i in "$(ls -1 *.mp3 | shuf)"; do echo "$i" | xargs -I{} mpg123 {}  ; done

tac

tac is cat spelled backward. And that is exactly what it does. Running this command will print the lines of a file in reverse order.

$ cat test.txt
This is line 1
This is line 2
This is line 3
This is line 4

$ tac test.txt
This is line 4
This is line 3
This is line 2
This is line 1

This has been just a small glimpse into what the coreutils has to offer! In part 2 I will present more useful, but mostly unknown commands that can be used to solve interesting problems. Did you know you can perform a prime number factorization from the command line with a single command? Read part 2 to find out how.


“Spared no expense”

2013-04-18

One of my favorite movies that I will watch any time it is playing is Jurassic Park. I don’t know what it is about that movie but every time I watch it I see or realize something new.

One of the characters, the billionaire founder of Jurassic Park John Hammond is always stating how he “spared no expense” while designing and building the park. Every aspect of the park was cutting edge, automated, and centrally controllable. All of this automation was provided by 8 networked Thinking Machine CM-5 supercomputers.

Now, while John Hammond’s boast might be true when it comes to the computer equipment, gourmet food, cars, and the DNA wet lab, he was actually quite cheap when it came to hiring trustworthy staff. In the end, this would prove to be his undoing.

image

In the famous line delivered by his developer Dennis Nedry:

I am totally unappreciated in my time. You can run this whole park from this room with minimal staff for up to 3 days. Do you think that type of automation is easy… or cheap? Do you know anyone who can network 8 connection machines and debug 2 million lines of code for what I bid for this job? Because if he can I’d like to see him try.

Dennis sums up a large and pervasive problem with how John Hammond deals with spending his money. He spends millions of dollars on super computers, and hires the lowest bidding, single developer to do a job that would take a team of five to get done right. At one point Hammond asks Nedry if he has “debugged the phones yet.” Why didn’t they use an off the shelf PBX system instead of writing their own? I have no idea!

So lets pretend for a moment that Nedry was disgruntled, but not a thief and would have just continued to finish out his contract at Jurassic Park. This still would have lead to the degradation, and eventual failure of the parks automated systems. Since Nedry was the only developer, there was no one else there to check his work or take over for any responsibilities. He was also under a tremendous time pressure to get the park open, which leads even the best developers to take shortcuts and just hack things together. In real dev work we call this “technical debt” because at some point you will pay the price for these bad decisions. Finally, once his contract was up he would have left on bad terms with Hammond, forcing Hammond to hire a new developer to continue Nedry’s work. The new developer would have to learn all of the code (over 2 million lines) and try to make fixes. More than likely the new developer would have discovered the immense technical debt that built up and have to re-write many parts of the system, which is a waste of time and money.

Although this was just a movie, I realized that this demonstrates an actual problem with the disconnect between stake holders (like John Hammond) and developers (like Dennis Nedry). I believe it is safe to assume that Hammond did not have a true grasp on the level of complexity of the system he was trying to implement. He is just a business man with a ground breaking idea, and a lot of money. Because of this he had no way of knowing what a bad bid for this development contract looked like. He may have had a few $800,000 - $1.1 million bids from teams of qualified developers, and saw Nedry at the bottom, who presumably had all of the necessary qualifications, with $150,000 and thought “well that’s a much better price to get this work done.” Instead he should have considered that it was a bad bid and no one would ever be able to do this work by themselves for so cheap.

Hammond does come to realize his mistake after the park systems go down. In a conversation in the cafeteria with Dr. Sattler he says that next time there will be less automation, and more people. Of course this is a knee jerk reaction to the immediate problem. The automation was not to blame, it was how he implemented the automation. He still does not have a grasp on real issues.

Of course I am ignoring the other pressures that were on Hammond in this analysis. It could be argued that he wanted minimal staff to keep maximum security and secrecy. But he still dropped that ball by hiring only one developer. Had he hired 3 developers to work on the project, Nedry would have had a hard time slipping his bad code into the system. The two other developers also would have been familiar enough with the system, and had the appropriate access levels to undo the Nedry hack.

So in conclusion, do I think Hammond was the “bad guy” in all of this, or that it was all his fault? No, Nedry ultimately decided to betray everyone and got himself and most of the staff of the Jurassic Park killed in the process. I am saying that Hammond should have invested more in his staff, and not just in the hardware. All the electric fences and automation in the world did not stop a fat man at a keyboard from tearing everything down.


“A busy month, a shiny new product!”

2013-04-03

This has been a hectic month at work! We are releasing several new products and unveiling them at NAB on April 8th. One of the products, the AdCaster is near and dear to my heart! I have been working on this for the last 3 months. My part was to create the C library to handle all SCTE-30 transactions.

So you might be wondering, what does this product do? Well it allows local broadcasters to inject (splice) local advertisements over the top of national advertisements. Have you ever been watching a cable channel like Comedy Central, and it goes to commercial and you see a split second of one ad, then it jumps to another ad? That was a local ad splice done by the local cable provider. This is how local businesses can get their commercials on channels like Fox or NBC. Of course you will only notice it when the splice point is slightly off, otherwise it is completely seamless.

So how does it all work? Actually that is the cool part (in my opinion). The national feed carries embedded markers in the multiplex (MPEG-2 transport stream). Those embedded markers are defined by SCTE-35.

image

A splicer is listening for those embedded messages and when it finds one it sends a connected ad server (the AdCaster) a message over SCTE-30 that it is coming up on an ad that can be replaced. The AdCaster responds and begins sending the new advertisement video to replace the national advertisement feed.

image

As long as the SCTE-35 markers are correct, and the two server are time-synced the downstream user should never even know a replacement was made. When the ad is complete the splicer returns to the national feed and the AdCaster logs a successful transaction.

Even though it was a scramble to get this ready for the 8th, I feel that we really have a great product here. We are still making tweaks and bug fixes to the ad delivery code and such but we have put our product through its paces and it’s ready for a production environment!

If you find yourself at NAB this year, please stop by the TelVue booth and check out a demo of the AdCaster in action.


“Updated examples for LJ Dart article”

2013-03-01

When I wrote the article for Linux Journal about Dart (back in December) I was using dart version 0.2.9.9_r16323, which at the time was the newest version. Now 3 months later when the article is published, the version of dart has advanced to version 0.4.0.0_r18915. This normally would not be an issue, but changes in the standard library (released Feb. 20th 2013) have broken the 3 month old example code provided in print. If that’s not bad timing I don’t know what is! Since I cannot update the print, I will instead give the updated example code here.

The first example, writing to a file, was affected by the addition of the IOSink class to the standard library and subsequent removal of the openOutputStream() method on the file class. To fix it I used the new File.OpenWrite() and IOSink.addString()methods.

   import 'dart:io';

   void main(){
     String fileName = './test.txt';
     File file = new File(fileName);

     var out = file.openWrite();
     out.addString("This is my first file output in Dart!\n");
     out.close();
   }
   

The wunder.dart example was by far the most impacted. The changes made to the streaming API, combined with the removal of the HttpClientConnection and InputStream classes, have made this program all but useless. It had to be re-written to support the new API. The HttpClient.getUrl() method now returns a Future<HttpClientRequest>. This future object returns a request object when it is done. That request object can be used to obtain the response object when it is closed. I was able to re-use the JSON code that I wrote in the previous example but I need to change how I imported the dart:json library.

   import 'dart:io';
   import 'dart:uri';
   import 'dart:async';
   import 'dart:json' as JSON;

   void main(){
     String apiKey = "";
     String zipcode = "";
     
     //Read the user supplied data form the options object
     try {
       apiKey = new Options().arguments[0];
       zipcode = new Options().arguments[1];
     } on RangeError {
       print("Please supply an API key and a zipcode!");
       print("dart wunder.dart <apiKey> <zipCode>");
       exit(1);
     }

     //Build the URI we are going to request data from
     Uri uri = new Uri("http://api.wunderground.com/"
         "api/${apiKey}/conditions/q/${zipcode}.json");

     HttpClient client = new HttpClient();
     client.getUrl(uri).then((request) {
       var response = request.close();
       response.then((response) => handleResponse(response));
     },
     onError: (AsyncError e) {
       print(e);
       exit(3);
     });

     client.close();
   }

   void handleResponse(HttpClientResponse response){
     List jsonData = [];
     response.toList().then((list) {
       jsonData.addAll(list.first);
       
       //response and print the location and temp.
       try {
         Map jsonDocument = JSON.parse(new String.fromCharCodes(jsonData));
         if (jsonDocument["response"].containsKey("error")){
           throw jsonDocument["response"]["error"]["description"];
         }
         String temp = jsonDocument["current_observation"]
            ["temperature_string"];
         String location = jsonDocument["current_observation"]
            ["display_location"]["full"];
         
         print('The temperature for $location is $temp');
       } catch(e) {
         print("Error: $e");
         exit(2);
       }
     },
     onError: (AsyncError e){
       print(e);
       exit(4);
     });
   }
   

This version will work just like the old one, but for some reason it hangs if you use the wrong Wunderground API key. I am not sure why, but I am working on a better version of this using the HttpClientResponse.listen() method. I will post that when I have something working but for now this will do.

The fingerpaint.dart example wasn’t too badly affected. Dart has change the semantics of how it deals with events. Now it uses the Stream class to make event streams that you “listen” to. So instead of registering with an event like _canvas.on.mouseDown.add((Event e) => _onMouseDown()); you would add a listener to the event stream _canvas.onMouseDown.listen((Event e) => _onMouseDown());.

Here is the new example code that will work with the newest Dartium.

   library fingerpaint;

   import 'dart:html';

   class DrawSurface {
     String _color = "black";
     int _lineThickness = 1;
     CanvasElement _canvas;
     bool _drawing = false;
     var _context;
     
     DrawSurface(CanvasElement canvas) {
       _canvas = canvas;
       _context = _canvas.context2d;
       _canvas.onMouseDown.listen((Event e) => _onMouseDown(e));
       _canvas.onMouseUp.listen((Event e) => _onMouseUp(e));
       _canvas.onMouseMove.listen((Event e) => _onMouseMove(e));
       _canvas.onMouseOut.listen((Event e) => _onMouseUp(e));
     }

     set lineThickness(int lineThickness) {
       _lineThickness = lineThickness;
       _context.lineWidth = _lineThickness;
     }

     set color(String color) {
       _color = color;
       _context.fillStyle = _color;
       _context.strokeStyle = _color;
     }

     int get lineThickness => _lineThickness;

     int get color => _color;
     
     void incrementLineThickness(int amount){
       _lineThickness += amount;
       _context.lineWidth = _lineThickness;
     }

     String getPNGImageUrl() {
       return _canvas.toDataUrl('image/png', 1.0);
     }

     _onMouseDown(Event e){
       _context.beginPath();
       _context.moveTo(e.offsetX, e.offsetY);
       _drawing = true;
     }

     _onMouseUp(Event e){
       _context.closePath();
       _drawing = false;
     }

     _onMouseMove(Event e){
       if (_drawing == true){
         _drawOnCanvas(e.offsetX, e.offsetY);
       }
     }

     _drawOnCanvas(int x, int y){
       _context.lineTo(x, y);
       _context.stroke();
     }

   }

   void main() {
     CanvasElement canvas = query("#draw-surface");
     DrawSurface ds = new DrawSurface(canvas);
     
     List buttons = queryAll("#colors input");
     for (Element e in buttons){
       e.onClick.listen((Event eve) {
         ds.color = e.id;
       });
     }

     var sizeDisplay = query("#currentsize");
     sizeDisplay.text = ds.lineThickness.toString();

     query("#sizeup").onClick.listen((Event e) {
       ds.incrementLineThickness(1);
       sizeDisplay.text = ds.lineThickness.toString();
     });

     query("#sizedown").onClick.listen((Event e) {
       ds.incrementLineThickness(-1);
       sizeDisplay.text = ds.lineThickness.toString();
     });

     query("#save").onClick.listen((Event e) {
       String url = ds.getPNGImageUrl();
       window.open(url, "save");
     });
   }
   

The fingerpaint webapp is affected by another breaking change as well. Google will no longer host the dart.js bootstrap file. Now you must use the pubprogram that comes with Dart. Pub is the dart package manager and uses YAML files for web application configuration. You must create a file called “pubspec.yaml” in the same directory as your project that contains the following:

   name: fingerpaint
   dependencies:
      browser: any
   

Once that has been created you run the command:

   $ pub install
   Resolving dependencies...
   Dependencies installed!
   

Now you will have a new directory called packages. Remember that really long link to the Dart bootstrap file? Well, now you need to change that link from:

<script src="http://dart.googlecode.com/svn/trunk/dart/client/dart.js"></script>
   

to:

<script src="packages/browser/dart.js"></script>
   

Now you should have no problems running this code through Dartium.

While it is embarrassing to have my first Linux Journal article meet the world DOA, I have learned a great deal from this experience! First, always include what version of whatever application or language you are using in the article. Had I simply mentioned that I used Dart 0.2.9.9 this whole situation would have gone better. Second, maybe I should wait until a project is a bit more mature before giving it a full write up. I am not upset about this situation, and I am glad Dart is moving forward so rapidly with development! I am just a bit angry at the timing of it all.  Oh well, just some bad luck! The guys and girls at Linux Journal were very understanding and made this whole situation go smoother. If you are a Linux Journal reader I hope you will understand as well. This is the price we must sometimes pay for bleeding edge technology.