Linux Hack of the Week #7: SED and AWK

If you deal with data, at some point you will need to transform and modify your inputs. Sometimes, you might get data that is tab or pipe delimited and you will need it as a .csv. Other times, you will get a wonky data source with 12 leading white spaces. SED and AWK are great tools for making the transformation of inputs less of a headache. I mostly use SED to replace text, and AWK to format text.

SED

SED is a stream editor, it takes input as a file or a pipe. Many new SED users spend all their time making changes to a file, only to look at it later and see no evidence of changes. It’s important to remember that you will either use -i to make changes “in place”, or use >> to write a new file.

As a good first example, let’s replace every occurrence of  “or” in Hamlet’s “to be or not to be” speech with “maybe”. To do that, we will use ‘s’ for substitute in SED.

joe$ head -1 2b.txt
To be, or not to be, that is the question:
joe$ sed -e 's/or/maybe/' 2b.txt | head -1
To be, maybe not to be, that is the question:
 

What if we wanted to change “be” to “exist”? Since it is on the line more than once, we need to add a flag. We would use the ‘g’ option for global replacement.

joe$ sed -e 's/be/exist/' 2b.txt | head -1
To exist, or not to exist, that is the question:
 

What if we wanted to replace all comma characters with pipes? You can use the same command as above.. Add the -i option to edit the file in place. Since we are using -i there will be no output. But we can use head on the file to see the changes.

joe$ sed -i -e 's/,/|/' 2b.txt
joe$ head 2b.txt
To be| or not to be, that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune|
Or to take Arms against a Sea of troubles|
And by opposing end them: to die| to sleep
No more; and by a sleep| to say we end
the heart-ache| and the thousand natural shocks
that Flesh is heir to? 'Tis a consummation
devoutly to be wished. To die| to sleep,
To sleep| perchance to Dream; aye, there's the rub


AWK

AWK is a text manipulation tool that can be used to modify, slice, dice and chop text. It is great for transforming text and printing it in the format you want. Let’s say you want to change the format of this state population .csv:

https://raw.githubusercontent.com/BuzzFeedNews/2015-11-refugees-in-the-united-states/master/data/census-state-populations.csv

state,pop_est_2014
Alabama,4849377
Alaska,736732
Arizona,6731484
Arkansas,2966369
California,38802500
Colorado,5355866

Your challenge is to print them out with the text “State Name: stateName | State Population: StatePop”. Since AWK separates fields by spaces, you have to be a bit tricky to account for states with spaces in the names. We tell AWK to use a field separator of comma with -F. Then, you’ll notice the file has a header which we don’t want to print. For that we use a RegEx matching capital letters [A-Z], as the header is in all lowercase and finally we print using $1 and $2 (the two fields in the .csv).

joe$ awk -F, ' /[A-Z]/ { print "State Name: " $1 " | State Population: " $2  } ' census-state-populations.csv
State Name: Alabama | State Population: 4849377
State Name: Alaska | State Population: 736732
State Name: Arizona | State Population: 6731484
State Name: Arkansas | State Population: 2966369
State Name: California | State Population: 38802500
State Name: Colorado | State Population: 5355866
 

This is just the tip of the iceberg! Want to learn more about AWK? Check out this manual from GNU: https://www.gnu.org/software/gawk/manual/gawk.html.

I hope this helps spark your interest in leveraging SED and AWK. As always, if you have any questions feel free to reach out: mcmanus@automox.com.  

About Automox

Facing growing threats and a rapidly expanding attack surface, understaffed and alert-fatigued organizations need more efficient ways to eliminate their exposure to vulnerabilities. Automox is a modern cyber hygiene platform that closes the aperture of attack by more than 80% with just half the effort of traditional solutions.

Cloud-native and globally available, Automox enforces OS & third-party patch management, security configurations, and custom scripting across Windows, Mac, and Linux from a single intuitive console. IT and SecOps can quickly gain control and share visibility of on-prem, remote and virtual endpoints without the need to deploy costly infrastructure.

Experience modern, cloud-native patch management today with a 15-day free trial of Automox and start recapturing more than half the time you're currently spending on managing your attack surface. Automox dramatically reduces corporate risk while raising operational efficiency to deliver best-in-class security outcomes, faster and with fewer resources.

Get Instant Updates on Vulnerabilities

Subscribe to receive Automox vulnerability alerts

Reduce your threat surface by up to 80%

Make all of your corporate infrastructure more resilient by automating the basics of cyber hygiene.

Take 15 days to raise your security confidence!
Start a Free Trial