Chapter 7: Unix Power Tools
7.1 Intro
In working though this seciton its good to have some files of dummy data. To have some interesting dummy files to work on you can use these services.
7.2 GREP
grep stands for: global regular expression print. It is a powerful search program from the command line.
grep string file.txt
It searches for the string you give it, inside the file you give it, and returns to the screen the lines that contain a match for the string you give it.
The search is case sensitive.
grep -i apples fruit.txt
options | |
---|---|
-i | Sets the search to case insensitive |
-w | Matches on whole words |
-v | Returns lines that don't match (inverse match) |
-n | Shows line numbers for the returned results |
-c | Shows the number of positive matches |
7.2.1 Input multiple files
To input multiple files give grep
a path to a folder full of files
grep -R apple Users/smerth/some_interesting_files
options | |
---|---|
-R | is used to direct the search recursively into the files in folders |
-n | applies line numbers to the match results |
-h | suppresses the file name and just gives the matched terms |
-l | just returns the file name and not the matched term |
-L | returns whole files that don't contain a match |
7.2.2 Wildcards
Use wildcards to refine the folders and files that are searched
ls *.txt
lists only the .txt files
so
grep apple Users/smerth/unix_files/*.txt
will look for the string apple
in only the text files
7.2.3 Pipe output from other sources into grep
You can pipe results from a command into grep
ps aux | grep terminal
ps
returns active processes, grep
searches these for lines containing the string terminal
or
ps aux | grep Applications
or
history | grep docker
This will search through your terminal history and return any lines (commands) you have run containing the string docker
.
7.2.4 Pipe output into grep and grep results into another command
history | grep drupal | less
which pipes the grep
results (lines containing the drupal
command from the drupal CLI,) into less
to give a paginated result list
7.2.5 Coloring matched text
grep --color lorem lorem_ipsum.txt
highlights the search results in a different color
grep --color=auto lorem lorem_ipsum.txt
Note that --color=auto
will only color results if returning them to terminal window. You don't want markup in results you pipe to another process.
In .bashrc
you can set and export a shell variables to set the default colors you want to use as well as other options you want to set as default for grep.
export GREP_OPTIONS="--color=auto"
color=auto
means color is always on...
for case insensitive search always on use:
export GREP_OPTIONS="-i"
7.2.6 Grep examples
7.2.6.1 Ex. 1 - Look in all files for a string and export lines containing that string to a new file
grep -n "YOUR SEARCH STRING" * > output-file
-n
(include line numbers)
*
(look in all files)
7.2.6.2 Ex. 2 - To look in a specific file
grep -n "YOUR SEARCH STRING" input-file > output-file
7.2.6.3 Ex. 3- Find all lines containing a string, in the input-file, except the lines that contain another string and pipe the result set to the output-file
grep -n "test" input-file | grep -v "modern" > output-file
-v (inverse switch, returns lines that don't match the condition)
7.2.6.4 Ex. 4 - Grep from a set of input files
Given a set of input files: demo-1, demo-2, demo-3, demo-4, demo-5
grep "this" demo_*
7.3 REGEX
regex
is short for regular expression. Regex is a sequence of characters that define a search pattern.
You can use regular expressions with grep
.
7.3.0.1 Using regular expressions with grep
Its a good idea to put quotes around the regex because some "regex symbols" and "special symbols in unix" are the same.
When using character sets [:alpha:]
with grep you need to put them into brackets [[:alpha:]]
so grep knows you are trying to match these character classes and not the character set '':alpha:"
There are basic and extended regular expressions. When using a text editor make sure you know which one works. In terminal you can set an option for using grep to turn the extended regex set on by default:
export GREP_OPTIONS="-E"
7.3.0.2 example: grep with regex
grep "foo.*bar" demo_file
matches lines beginning with "foo." and ending with "bar" with anything in between.
7.3.0.3 Resources
N.B. Regex is a whole world unto itself , consult another document for more detail.
Regular expressions in Javascript
7.4 TR
Translating characters. tr
finds and replaces a string.
The first argument to tr
is the search string, the second argument is the replacment string
echo "a,b,c" | tr ',' '-'
Here it looks like tr
searches the input string for ,
and replaces each one it finds with a -
but actually... it maps the position of the search character to the replacement character
tr '123456' 'EBGDAE'
For example, all instances of 1 are replaced by E, all instances of 2 are replaced by B, etc...
Run this example to see it at work.
echo "The first argument is the search string, for the second is the replacment string" | tr 'ag' '12345'
(If the replacement set is smaller than the search set the last character of the replacement set is repeated until the necessary spaces are accounted for. (needs an example))
7.4.1 TR Examples
7.4.1.1 Ex. 1 - take a file as input
tr 'A-Z' 'a-z' < people.txt
takes the file as input and swaps uppercase alphabet for lowercase alphabet
7.4.1.2 Ex. 2 - take a file as input and output to another file
tr ',' '\t' < people.csv > people.tsv
replace comma with a tab in a csv delimited datafile, then output to a tab delimited file.
7.4.1.3 Ex. 3 - delete characters
-d (delete characters in a listed set)
echo 'abc1233deee567f' | tr -d [:digit:]
returns: abcdeeef
7.4.1.4 Ex. 4 - squeeze characters
-s (squeeze will delete repeats in a listed set)
echo 'abc1233deee567f' | tr -s [:digit:]
returns: abc123de567f
7.4.1.5 Ex. 5 - return the complimentary set
-c (use complimentary set)
7.4.1.6 Ex. 6 - delete characters not in a listed set
echo 'abc1233deee567f' | tr -dc [:digit:]
returns: 1233567
7.4.1.7 Ex. 7 - squeeze characters not in a listed set
echo 'abc1233deee567f' | tr -sc [:digit:]
returns: abc1233de567f
7.4.1.8 Ex. 8 - specify an argument for each option
echo 'abc1233deee567f' | tr -sd [:digit:] [:alpha:]
this means: squeeze the digits and delete the alpha cahracters
returns: 123567
7.4.1.9 Ex. 9 - remove all non-printable characters from a file
tr -dc [:print:] < file1 > file2
7.4.1.10 Ex. 10 - remove all surplus carriage return and end of file characters (cleaning windows documents)
tr -d '\015\032' < windows.file > unix.file
7.4.1.11 Ex. 11 - remove double spaces from file
tr -s ' ' < file1 > file2
7.5 SED
Stream editor takes a stream of input and edits it.
The sed syntax always follows this pattern
sed 's/a/b/'
s: substitution a: search string b: replacement string
echo 'upstream' | sed 's/up/down/'
returns: downstream
by default it only changes the first occurance of the search string
7.5.1 sed: global replacement
adding a 'g' for global will change the action to substituting all occurances
echo 'upstream and upward' | sed 's/up/down/g'
returns: downstream and downward
the delimiter can be anything you choose:
sed 's/up/down/'
or
sed 's:up:down:'
or
sed 's|up|down|'
You can switch to another delimiter when the search string you are using contains the default delimiter.
When feeding a file into sed
, each line in the file is treated as a stream and so each line will be acted upon.
sed 's/pear/mango/' fruit.txt
Notice that sed will take a file as input without using the '<' cahracter (although you can use it)
You can string multiple sed
commands using the -e
option
sed -e 's/pear/mango/' -e 's/apple/mango/' fruit.txt
7.5.2 sed: regex
Regex works with sed
in the same way that it works with grep
. As with grep
you need to watch out for extended character sets (or set sed
to use the extended regex set in .bashrc
)
In the following examples we will set sed
to use extended regex set using
-E
7.5.3 sed: regex back-references
Back-references are part of regular expressions and sed
makes good use of them
echo 'daytime' | sed -E 's/(...)time/\1light/'
You can define up to 9 back-references in the search string. That is, up to 9 sets of parenthesis containing a search string. These will be referenced by the numbers \1
through \9
in order to replace any occurance of the search string with the appropriate replacement string.
echo 'Dan Stevens' | sed -E 's/([A-Za-z]+) ([A-Za-z]+)/\2, \1/'
You could run that over an entire file.
7.6 CUT
Cutting select text portions. Cut
allows you to cut one of three options:
options | |
---|---|
-c | characters |
-b | bytes |
-f | fields |
7.6.0.1 Non de-limited files
cut -c 1-10 < mock_lorem_ipsum.txt
you can cut multiple columns
cut -c 1-10,34-37,47- < mock_lorem_ipsum.txt
When working with non-delimited files you define the columns you want to cut by indicating a range of characters (5-19)
7.6.0.2 De-limited files
But if the file is delimited all you need to do is indicate the column number and use the -f
fields option
cut -f 2,3 < mock_tab_data.txt
you can specify the delimiter with the -d option
cut -f 2,6 -d "," < presidents.csv
MacOS Note!
When using Terminal on MacOS there is a trick to entering the tab character.
Hold down
ctl + v
and then quicky presstab
. Takes a little practice...
7.7 DIFF
Comparing files
diff mock_lorem_start.txt mock_lorem_edited.txt
>
indicates an insertion.
<
indicates a deletion.
options | |
---|---|
-i | Case insensitive |
-b | Ignor changes to blank character |
-w | Ignor all whitespace |
-B | Ignor blank lines |
-r | recursively compare directories |
-s | show identical files |
7.7.1 diff: Alternative formats
options | |
---|---|
-c | Copied context |
-u | Unified context |
-y | side-by-side |
-q | Only whether files differ |
7.7.2 Using a text editor to view diffs
Do a search on your mac for FileMerge. It comes with Apple's Developer Tools.
It's not pretty but gets the job done.
7.8 XARGS
Passing an argument list to commands
xargs
is short for "execute as arguments".
xargs
parses an input stream into items and then it loops through each item in that list and passes it to a command.
Here's an example
wc lorem_ipsum.txt
3077 37859 256409 mock_lorem_ipsum.txt
3077 - newline count
37859 - word count
256409 - byte count
echo 'lorem_ipsum.txt' | wc
Word count returns
1 1 21
Because the string 'mock_lorem_ipsum.txt'
is being passed to wc
instead of the file.
To pass the file to wc
as an argument use xargs
.
echo 'lorem_ipsum.txt' | xargs wc
Returns the word count stats for the file (ie: the file is passed into the command wc as an argument)
you can use the -t argument to see what it does
echo 'lorem_ipsum.txt' | xargs -t wc
-t
outputs the commands as they are run
Now try to pass multiple arguments into wc
echo 'lorem_1.txt lorem_2.txt' | xargs -t wc
You can see that first one file was passed as an argument into wc
then the second file was passed into wc
as an argument.
The results for each run of the command are listed one after another. Rather conviently wc
also returns totals.
If you want it to loop through x number of arguments you can use the -n
option
echo 'lorem_1.txt lorem_2.txt' | xargs -t -n1 wc
Here -n
is set to 1. So, wc
runs with argument 1 then loops and runs again with argument2, etc...
To see this clearly run:
echo 1 2 3 4 | xargs -t -n2
echo
is run first with arguments 1 and 2
echo
is then run again with arguments 3 and 4
Here's another example:
cat mock_companies.csv | xargs -I {} echo "Buy stock in: {}"
-I
specifies a placeholder for the looped output from cat. so the result is something like
Buy stock in: Trilia Buy stock in: Cogilith Buy stock in: Centidel etc...
When passing filenames as arguments there is an issue with names containing a space. xargs
may see each word in the name as a new argument. use the -O
option
ls ~/Library/ | grep 'A.*' | xargs -0 -n1
7.8.1 xargs Examples
7.8.1.1 Example 1 - Take a list of company names and make a directory called "Companies" on the desktop containing a sub-directory for each company in the list
cat mock_companies.csv | sort | uniq | xargs -I {} mkdir -p ~/Desktop/Companies/{}