Files in Bash

Bash scripts are very useful for processing files and automating tasks. Here are some reasons why Bash is good for file processing:

Bash is installed by default on most Linux systems, so it's widely available.
Bash has built-in commands for working with files, like:
- cat - Concatenate files
- cp - Copy files
- mv - Move/rename files
- rm - Remove files
- grep - Search files for patterns
- sort - Sort lines of text files
- head/tail - View first/last lines of files
- cut - Cut parts of text files
- etc.
Bash has variables to store file names, paths, and contents. You can manipulate these variables to process multiple files.
Bash has string manipulation functions like substring, length, uppercase, lowercase etc. This is useful when processing text files.
Bash has conditional statements (if/else), loops (for/while) and functions to script complex file processing tasks.
Bash scripts are text files, so they're human readable, editable and portable.
Bash scripts can take command line arguments to make them more flexible and reusable.

For example, you can write a Bash script to:

Copy all .txt files from one directory to another
Rename multiple files in a folder
Remove lines containing a certain pattern from text files
Sort contents of multiple text files
Concatenate multiple files into one
Extract certain columns from CSV/TSV files
And much more...

So in summary, Bash is good for processing text files and automating file tasks because:

It has built-in file-handling commands
It has variables, string functions and flow control to script complex tasks
Bash scripts are human-readable, editable and portable text files
Bash is widely available on Linux systems by default

Hope this helps explain why Bash is useful for processing files and automating tasks! Let me know if you have any other questions.

Overview

Here is an overview of Bash file handling:

Reading files:

Use the cat command to read an entire file at once:

cat filename

Use the more command to read a file page by page:

more filename

Use the head command to read the first 10 lines of a file:

head filename

Use the tail command to read the last 10 lines of a file:

tail filename

Writing to files:

Use the > redirect operator to overwrite a file:

command > filename

Use the >> redirect operator to append to a file:

command >> filename

Processing files:

Use utilities like grep, sed, awk etc. to search, filter and modify file contents. For example:

grep "pattern" filename   # Search for pattern 
sed "s/old/new/" filename # Replace old with new
awk '{print $1}' filename # Print first column

Use xargs to apply a command to the results of another command. For example, to delete all .txt files:

ls *.txt | xargs rm

Reading a file

Here is an example Bash script to open a text file and read each line:

#!/bin/bash

filename="file.txt"

# Open the file 
file=$(open $filename)

# Read each line 
while IFS= read -r line 
do
    # Print the line
    echo $line 
done < $file

# Close the file 
close $file

Let's break it down:

We define the filename variable with the file path
We use open $filename to open the file
We read each line using read -r line inside a while loop
We print each line using echo $line
We close the file using close $file

So this script will:

Open the file.txt file
Read each line using the read command
Print that line to the console using echo
Close the file when done

A sample file.txt file could contain:

Line 1
Line 2
Line 3

When you run the script, the output will be:

Line 1  
Line 2
Line 3

The while IFS= read -r line notation is used to read lines from a file line by line in Bash. This notation may be difficult to understand for a beginner.

Let's break it down:

IFS= is used to set the Internal Field Separator to nothing. This is done so that the read command reads the entire line, even if it contains spaces.
read -r uses the read command with the -r option. The -r option tells read to not interpret backslashes. This is useful when reading configuration files.
line is a variable name that each line of the file will be read into. You can use any variable name here.

So in short, this notation does 3 things:

It sets IFS to read the entire line, not just words separated by spaces
It uses the -r option with read to not interpret backslashes
It reads each line into a variable (line in this case)

This notation is used inside a while loop, so the read command will be executed on each line of the file:

while IFS= read -r line; 
do
    # Process $line 
done < filename

This will read each line of the filename file and store it in the $line variable, which you can then process inside the loop.

So in summary, while IFS= read -r line is a Bash notation to read each line of a file line by line into the $line variable, with options to read the entire line and not interpret backslashes.

File descriptor

File descriptors are numbers that the Linux kernel uses to keep track of files opened by a process. There are 3 default file descriptors in Bash:

Standard Input (FD 0)- By default, it reads from the keyboard.
Standard Output (FD 1) - By default, it writes to the screen.
Standard Error (FD 2) - By default, it writes error messages to the screen.

You can also open additional files and they will be assigned new file descriptors like FD 3, FD 4, and so on.

File descriptors allow you to perform operations on files like:

Read from a file - Using read var < FD
Write to a file - Using echo "text" > FD
Append to a file - Using echo "text" >> FD
Close a file - Using close FD

For example:

exec 3< file.txt  # Opens file.txt and assigns FD 3

while read line <&3
do
    echo $line
done

close 3 # Closes FD 3

Here we:

Open file.txt and assign FD 3 using exec 3< file.txt
Read each line using read line <&3 where <&3 means read from FD 3
Echo each line
Close FD 3 using close 3

So in summary, file descriptors allow your Bash script to manipulate multiple files simultaneously by assigning them unique numbers.

The default file descriptors (0, 1 and 2) represent stdin, stdout and stderr respectively, but you can open additional files and they will be assigned the next available file descriptor number.

Multiple files

Here is an example Bash script that finds all .txt files in the current directory and displays the first 3 lines of each file:

#!/bin/bash

for file in *.txt 
do 
    echo "File: $file"
    head -n 3 $file
done

Breaking it down:

#!/bin/bash - Specifies this is a Bash script
for file in *.txt - Loops through all files ending in .txt
echo "File: $file" - Prints the file name
head -n 3 $file - Uses the head command to display the first 3 lines (-n 3) of the current file ($file)
done - Ends the for loop

When you run this script, it will:

Find all files ending in .txt in the current directory
For each .txt file:

Print the file name
Use head to display the first 3 lines of that file

Repeat for all .txt files

So if you had:

file1.txt
file2.txt
file3.txt

The output would be:

File: file1.txt  
Line 1 of file1.txt
Line 2 of file1.txt  
Line 3 of file1.txt

File: file2.txt
Line 1 of file2.txt
Line 2 of file2.txt  
Line 3 of file2.txt

File: file3.txt
Line 1 of file3.txt
Line 2 of file3.txt   
Line 3 of file3.txt

Disclaim

This article was created using AI (Rix). You can explore other topics yourself. Here are some other interesting topics about file handling in Bash that you could research:

File permissions - Understanding file permissions and how to change them in Bash scripts using chmod. This allows your script to manipulate files with different access levels.
File ownership - Using chown to change the owner and group of files. This is important for file security and access.
File timestamps - Using touch to update access, modification and change times of files. This can be useful for tracking when files were last accessed or modified.
File comparisons - Using diff to compare the contents of two files and see what has changed. This is useful for version control and auditing files.
Temporary files - Creating and using temporary files in your Bash scripts. This is important to avoid clashes with existing files.
File searching - Using find to search for files based on name, type, size, permissions, owner, group, mtime etc. This allows you to locate specific files programmatically.
File archiving - Using tar to archive groups of files into compressed files for easy distribution and installation.
Here documents - Using here documents to pass large amounts of text as input to commands. This is a simple way to pass configuration data to programs from Bash scripts.
Pipelines - Connecting multiple commands together using pipelines (|) to perform data transformations on files. This allows complex file handling operations.
Redirection - Using >, >> and |& to redirect file descriptors to and from files. This allows your Bash scripts to manipulate input/output easily.

Those are some interesting advanced topics related to file handling in Bash that you could explore further. You have to decide if you study more when you need or sponsor me to research a particular topic. Comment below what topic you want to expand in my future articles.