Chapter 9 – Advanced Data Processing

Processing data on Linux is really simple. Many commands are available for all kinds of text-processing functions. We’ve seen some of these commands in the previous chapter, but their number and wide range are much larger for them to be compressed in a single chapter.

Considering this, I deemed it would be necessary to dedicate another chapter to explore the rest of the commands that we haven’t yet had the chance to cover.

Joining Files

Using Paste

paste” is one of the most useful commands on Linux. Depending on the provided arguments, there are two ways you can use it:

  1. To join files horizontally.
  2. To join lines in a single file.

Let’s start with the first usage method.

Joining Files Horizontally

When you provide paste with two files or more as its arguments, it will join the lines of these files and send them to its standard output. Fields from these files will be separated by a tab.

To illustrate this with an example, let us consider the following three files: names.txt, age.txt, and city.txt.

  • names.txt :
  • age.txt :
  • city.txt :
New York

If we run “paste” against these three files without any other additional parameter, we should get an output that combines each line of these three files. Here is the result:

$ paste names.txt age.txt city.txt 
James   34      London
Mary    21      New York
Patricia        54      Liverpool
Robert  49      Sydney
John    18      Glasgow

By default, the command separates the fields from each file with a tab. If you want, you can use another delimiter by specifying it after the “-d” flag.

$ paste -d ":" names.txt age.txt city.txt
Mary:21:New York

Joining Lines in a Single File

When you provide “paste” with only one single file and use the “-s” flag, then it will combine all the lines of that file in a single line.

Here is an example to make things clear.

$ paste -s names.txt
James Mary Patricia Robert John

You can use another delimiter other than the default one. To do so, just type in “-d” followed by the delimiter just like we did before.

$ paste -s -d "," names.txt

Using Join

The “join” command allows you to join files based on a common field.

Let us consider the following two files :

  • name-with-city.txt
James London
Mary New York
Patricia Liverpool
Robert Sydney
John Glasgow
  • name-with-age.txt
James 34
Mary 21
Patricia 54
Robert 49
John 18

We can use “join” to combine these two files. It will combine lines from both files that start with the same first field.

$ join name-with-city.txt name-with-age.txt
James London 34
Mary New York 21
Patricia Liverpool 54
Robert Sydney 49
John Glasgow 18

Text Transformation

Replacing Characters

The “tr” command is easy to use and very practical when you need to replace certain characters with others.

For example, the command below will convert everything we type into uppercase.

$ tr a-z A-Z
Typing random words!

In this example, we have applied the “tr” command to the standard input. So everything we type on the terminal will get uppercased. But what if we want to uppercase the content of an existing file?

Well, that’s also possible.

To do that, we can read the file using “cat” and then pass the output to “tr” using a pipe, just as the next example will demonstrate.

Remember our file “names.txt” from the first example? All names were written in lowercase except for their first letters. Well, I personally prefer names to be in uppercase.

So, to make this file more to my taste, I’m going to use “tr” to replace all lowercase letters with their uppercase counterparts, and then save the result to a new file “namesUp.txt“.

$ cat names.txt | tr a-z A-Z > namesUp.txt

This command will not output anything to the terminal. This is because we have redirected the standard output to the file “namesUp.txt” using the “greater-than” (>) symbol.

If you’re having difficulties comprehending this command, then it is probably because you have skipped the chapter about piping and redirection. If that’s the case, then I invite you to go back. read that chapter, and make sure to fully understand the concepts discussed there before moving on.

Now, when we read the new file “namesUp.txt“, we can see that the names are now in uppercase letters.

$ cat namesUp.txt

Advanced Text Transformation

There is another, more powerful, command that allows you to perform advanced text manipulation operations. This command is ‘sed‘, which stands for Stream EDitor.

The ‘sed‘ command allows you to automatically find and replace certain strings on a given file without having to open it. Let’s see how we can do this.

For this section, we’ll consider a file named “input.txt” with the following content:

This is John.
John is currently learning Linux, and he's constantly improving.
With enough time and dedication, John will eventually become a Linux expert.

Now, let’s say we need to substitute all occurrences of “John” with “Robert” (Feel free to use your own name if you wish).

Using ‘sed’, we can easily perform this operation.

$ sed 's/John/Robert/g' input.txt

Here, we have executed the sed command followed by a string pattern placed within single quotes, and then the path to a file.

The string pattern (s/John/Robert/g) starts with the ‘s’ character, which means that we will be using substitution. After the first slash, we type the string that we want to replace (In this case, John), and then, after the second slash, the string that we’ll be replacing it with (Robert). Finally, after the third and last slash, we have added the ‘g’ character, which stands for global. By default, sed will only replace the first occurrence of the string pattern in the file. By adding ‘g’, we make sure that all occurrences are replaced.

Here is the result we get from this command.

$ sed 's/John/Robert/g' input.txt
This is Robert.
Robert is currently learning Linux, and he's constantly improving.
With enough time and dedication, Robert will eventually become a Linux expert.

We have reached the end of this chapter. We have covered most text processing commands that you can use in Linux.

In the next chapter, we are going to learn about regular expressions.


Leave a Reply

Your email address will not be published. Required fields are marked *