Files in Bash

Files in Bash

Bash file processing and file handling.

Bash scripts are very useful for processing files and automating tasks. Here are some reasons why Bash is good for file processing:

  • Bash is installed by default on most Linux systems, so it's widely available.

  • Bash has built-in commands for working with files, like:

    • cat - Concatenate files

    • cp - Copy files

    • mv - Move/rename files

    • rm - Remove files

    • grep - Search files for patterns

    • sort - Sort lines of text files

    • head/tail - View first/last lines of files

    • cut - Cut parts of text files

    • etc.

  • Bash has variables to store file names, paths, and contents. You can manipulate these variables to process multiple files.

  • Bash has string manipulation functions like substring, length, uppercase, lowercase etc. This is useful when processing text files.

  • Bash has conditional statements (if/else), loops (for/while) and functions to script complex file processing tasks.

  • Bash scripts are text files, so they're human readable, editable and portable.

  • Bash scripts can take command line arguments to make them more flexible and reusable.

For example, you can write a Bash script to:

  • Copy all .txt files from one directory to another

  • Rename multiple files in a folder

  • Remove lines containing a certain pattern from text files

  • Sort contents of multiple text files

  • Concatenate multiple files into one

  • Extract certain columns from CSV/TSV files

  • And much more...

So in summary, Bash is good for processing text files and automating file tasks because:

  • It has built-in file-handling commands

  • It has variables, string functions and flow control to script complex tasks

  • Bash scripts are human-readable, editable and portable text files

  • Bash is widely available on Linux systems by default

Hope this helps explain why Bash is useful for processing files and automating tasks! Let me know if you have any other questions.

Overview

Here is an overview of Bash file handling:

Reading files:

  • Use the cat command to read an entire file at once:
cat filename
  • Use the more command to read a file page by page:
more filename
  • Use the head command to read the first 10 lines of a file:
head filename
  • Use the tail command to read the last 10 lines of a file:
tail filename

Writing to files:

  • Use the > redirect operator to overwrite a file:
command > filename
  • Use the >> redirect operator to append to a file:
command >> filename

Processing files:

  • Use utilities like grep, sed, awk etc. to search, filter and modify file contents. For example:
grep "pattern" filename   # Search for pattern 
sed "s/old/new/" filename # Replace old with new
awk '{print $1}' filename # Print first column
  • Use xargs to apply a command to the results of another command. For example, to delete all .txt files:
ls *.txt | xargs rm

Reading a file

Here is an example Bash script to open a text file and read each line:

#!/bin/bash

filename="file.txt"

# Open the file 
file=$(open $filename)

# Read each line 
while IFS= read -r line 
do
    # Print the line
    echo $line 
done < $file

# Close the file 
close $file

Let's break it down:

  • We define the filename variable with the file path

  • We use open $filename to open the file

  • We read each line using read -r line inside a while loop

  • We print each line using echo $line

  • We close the file using close $file

So this script will:

  1. Open the file.txt file

  2. Read each line using the read command

  3. Print that line to the console using echo

  4. Close the file when done

A sample file.txt file could contain:

Line 1
Line 2
Line 3

When you run the script, the output will be:

Line 1  
Line 2
Line 3

The while IFS= read -r line notation is used to read lines from a file line by line in Bash. This notation may be difficult to understand for a beginner.

Let's break it down:

  • IFS= is used to set the Internal Field Separator to nothing. This is done so that the read command reads the entire line, even if it contains spaces.

  • read -r uses the read command with the -r option. The -r option tells read to not interpret backslashes. This is useful when reading configuration files.

  • line is a variable name that each line of the file will be read into. You can use any variable name here.

So in short, this notation does 3 things:

  1. It sets IFS to read the entire line, not just words separated by spaces

  2. It uses the -r option with read to not interpret backslashes

  3. It reads each line into a variable (line in this case)

This notation is used inside a while loop, so the read command will be executed on each line of the file:

while IFS= read -r line; 
do
    # Process $line 
done < filename

This will read each line of the filename file and store it in the $line variable, which you can then process inside the loop.

So in summary, while IFS= read -r line is a Bash notation to read each line of a file line by line into the $line variable, with options to read the entire line and not interpret backslashes.


File descriptor

File descriptors are numbers that the Linux kernel uses to keep track of files opened by a process. There are 3 default file descriptors in Bash:

  1. Standard Input (FD 0)- By default, it reads from the keyboard.

  2. Standard Output (FD 1) - By default, it writes to the screen.

  3. Standard Error (FD 2) - By default, it writes error messages to the screen.

You can also open additional files and they will be assigned new file descriptors like FD 3, FD 4, and so on.

File descriptors allow you to perform operations on files like:

  • Read from a file - Using read var < FD

  • Write to a file - Using echo "text" > FD

  • Append to a file - Using echo "text" >> FD

  • Close a file - Using close FD

For example:

exec 3< file.txt  # Opens file.txt and assigns FD 3

while read line <&3
do
    echo $line
done

close 3 # Closes FD 3

Here we:

  • Open file.txt and assign FD 3 using exec 3< file.txt

  • Read each line using read line <&3 where <&3 means read from FD 3

  • Echo each line

  • Close FD 3 using close 3

So in summary, file descriptors allow your Bash script to manipulate multiple files simultaneously by assigning them unique numbers.

The default file descriptors (0, 1 and 2) represent stdin, stdout and stderr respectively, but you can open additional files and they will be assigned the next available file descriptor number.


Multiple files

Here is an example Bash script that finds all .txt files in the current directory and displays the first 3 lines of each file:

#!/bin/bash

for file in *.txt 
do 
    echo "File: $file"
    head -n 3 $file
done

Breaking it down:

  • #!/bin/bash - Specifies this is a Bash script

  • for file in *.txt - Loops through all files ending in .txt

  • echo "File: $file" - Prints the file name

  • head -n 3 $file - Uses the head command to display the first 3 lines (-n 3) of the current file ($file)

  • done - Ends the for loop

When you run this script, it will:

  1. Find all files ending in .txt in the current directory

  2. For each .txt file:

  • Print the file name

  • Use head to display the first 3 lines of that file

  1. Repeat for all .txt files

So if you had:

  • file1.txt

  • file2.txt

  • file3.txt

The output would be:

File: file1.txt  
Line 1 of file1.txt
Line 2 of file1.txt  
Line 3 of file1.txt

File: file2.txt
Line 1 of file2.txt
Line 2 of file2.txt  
Line 3 of file2.txt

File: file3.txt
Line 1 of file3.txt
Line 2 of file3.txt   
Line 3 of file3.txt

Disclaim

This article was created using AI (Rix). You can explore other topics yourself. Here are some other interesting topics about file handling in Bash that you could research:

  1. File permissions - Understanding file permissions and how to change them in Bash scripts using chmod. This allows your script to manipulate files with different access levels.

  2. File ownership - Using chown to change the owner and group of files. This is important for file security and access.

  3. File timestamps - Using touch to update access, modification and change times of files. This can be useful for tracking when files were last accessed or modified.

  4. File comparisons - Using diff to compare the contents of two files and see what has changed. This is useful for version control and auditing files.

  5. Temporary files - Creating and using temporary files in your Bash scripts. This is important to avoid clashes with existing files.

  6. File searching - Using find to search for files based on name, type, size, permissions, owner, group, mtime etc. This allows you to locate specific files programmatically.

  7. File archiving - Using tar to archive groups of files into compressed files for easy distribution and installation.

  8. Here documents - Using here documents to pass large amounts of text as input to commands. This is a simple way to pass configuration data to programs from Bash scripts.

  9. Pipelines - Connecting multiple commands together using pipelines (|) to perform data transformations on files. This allows complex file handling operations.

  10. Redirection - Using >, >> and |& to redirect file descriptors to and from files. This allows your Bash scripts to manipulate input/output easily.

Those are some interesting advanced topics related to file handling in Bash that you could explore further. You have to decide if you study more when you need or sponsor me to research a particular topic. Comment below what topic you want to expand in my future articles.