Benjamin Schieder


2005 November 05

So, you have a problem. There are different ways to solve a problem. Creating something that will do so is only one of them. Here are more:

  1. Write a program that solves the problem
  2. Ask some to write a program that solves the problem
  3. Work around the problem
  4. Give up and do something else

We already talked about 1., doing it yourself.
Option 2. usually contains the event of money changing owners (from your pocket to their pocket, yes this can also mean my pocket if you want :) ) which you might want to avoid.
Option 3. would translate into taking a detour with the car, reaching the goal/destination later than you want to, but without the hassle of solving a problem.
Option 4. should really be avoided. Noone likes being called a quitter.

So, let's resume: You don't want to quit, so 4. is ruled out.
You don't want to take a detour. Noone wants to live on the slow lane these days, so 3. is ruled out.
Also, you can't or don't want to afford hiring someone else to do it, so 2. is ruled out.

That leaves you with solving the problem yourself, which will also give you the most respect from your peers.

How to solve a problem

To solve a problem you have to first understand it. Otherwise you just blindly try and usually miss the point totally.
To understand the problem it is easiest to just write down what you want to achieve.

  1. First: What do we want to do
  2. and more importantly: What do we not want to do

Practical Example

Let's take out our first example here which just recently popped up on the newsgroup
You have a directory hierarchy with lots of files that contain the space character and you want to replace these with the underscore character.

On a difficulty scale from 0 to 100 I'd rate this an 8. It's not much to do but there's a lot to do wrong here.

First, stuff we want to do: We want to replace spaces in filenames to underscores. We want to do this for all files in a directory and all files in all sub-directories. We want to do this again and again because - for example - people are uploading images to our server and they add spaces to their filenames.

Second, stuff we do not want to do: We do not want to rename directories (just to add some spice). We do not want to do this manually. We do not want to have to reproduce the solution again and again.

Okay, that's a nice list of stuff to do, but don't let it frighten you. It sounds worse than it actually is.
Let's start off with the first thing we want to do: Change all spaces in a filename to underscores. That's pretty easy.
Given a filename 'Picture of my house 1.jpg' you would call the following command:

mv 'Picture of my house 1.jpg' 'Picture_of_my_house_1.jpg'

That's how you rename files, by moving them to their new filename. Too bad we can't know what the files are called when users upload them to our server and we sure as hell don't want to redo the script for every possible filename again (see Stuff we don't want to do).
For that same reason we will now start putting the code into a file now. Let's call it 'removespaces.bash'. This file should now look like this:

mv 'Picture of my house 1.jpg' 'Picture_of_my_house_1.jpg'

The first line means 'run this file as a parameter to the program /bin/bash'. If the program 'bash' is somewhere else on your system you should change the path here. To find out where it is run the command 'which bash'.
Now, we only have solved one item from our Stuff we want to do list, and that only half assed. Let's look at the next one: We want to rename all files in all subdirectories that contain spaces.
Not surprisingly, there is a command that helps us find all such files and it's called.... find! Almost too easy, isn't it?
How does find work? That's best displayed using a small example that actually fits our problem: We want all files in the current directory that contains a space:
find . -type f -a -name '* *'

Explanation: The '.' stands for the current directory. -type f means 'match only regular files, no directories or special files'. -a means a logical and, so both the previous and the following filter must match. -name '* *' means to check the filename against the pattern '* *' which in turn stands for 'any character any number of times followed by a space followed by any character any number of times'. In short, this command will list all files that contain a space in the current directory and all subdirectories and list them one per line:
find . -type f -name '* *'
./subdirectory/sub directory/file 1.jpg
./subdirectory/sub directory/file 2.jpg
./subdirectory/file 1.jpg
./subdirectory/file 2.jpg
./file 1.jpg
./file 2.jpg

To get the output from the find command we use some magic called a 'pipe'. A pipe means to connect the output from one command to the input of another. In practice, this will look like this:

find . -type f -a -name '* *' | while read oldfilename ; do
# here we do something with the file

Here we have a while loop which iterates as often as it gets a line from the find command. Where currently the text 'do something' is we can now rename each file. To replace each space into a underscore we will use a bash builtin feature which is displayed in the next example:

find . -type f -a -name '* *' | while read oldfilename ; do
        newfilename=${newfilename// /_}
        mv "${oldfilename}" "${newfilename}"

What happens here? The magic happens in the line 'newfilename=${newfilename// /_}'. Here all spaces in the variable 'newfilename' will be replaced by underscores.
With this script we now have all the things from our Stuff we want to do list finished. You think we're finished? Well, think again... we also have the Stuff we don't want to do list and that one states that we do not want to replace spaces in directory names. To achieve this I will introduce you to two more programs and another nifty feature of bash. Let's start with the programs.
The first one is 'dirname'. 'dirname' tells us the directory part of a path as 'find' returns it. The second one is called 'basename' and is complementary to 'dirname' because it tells us the actual filename. Have a look at this example:

file='./subdirectory/sub directory/file 1.jpg'
dirname "${file}"
basename "${file}"

This code snippet will produce a two line output where the first line contains './subdirectory/sub directory' and the second line contains 'file 1.jpg'.
To use this output we can use a feature of bash that let's us use the output of a command as part of our script. Consider this code:

find . -type f -a -name '* *' | while read file ; do
        directory="$( dirname ${file} )"
        oldfilename="$( basename ${file} )"
        newfilename=${newfilename// /_}
        mv "${directory}/${oldfilename}" "${directory}/${newfilename}"

Here, the variable 'directory' will contain the output of the command 'dirname ${file}' and the variable 'oldfilename' will contain the output of the command 'basename ${file}'. We can then use these for other stuff.


The conclusion here is that it's actually quite easy to solve problems. But you have to understand them first. If you don't understand a problem you can never fix it accurately. Also, you must know what you want to do and what you don't want to do. The answers to both of these things are essential because if you don't know where you're going you will almost always end up somewhere else.


Category: blog

Tags: Solutions