While working on extracting data from large amount of files, i have compiled some commands over the years to really helps a lot.
Most of the time, we use head, tail, grep. However, these commands are good at wholesale extracting or just by some keywords. For more complex extraction, we may use sed, perl or awk instead.
Using myfile as example, myserver:/tmp/:>head -10 myfile
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
#
# bos61D src/bos/usr/sbin/netstart/hosts 1.2
#
# Licensed Materials - Property of IBM
#
# COPYRIGHT International Business Machines Corp. 1985,1989
# All Rights Reserved
#myserver:/tmp:>tail -2 myfile
10.1.1.123 host1
10.2.1.124 host2
myserver:/tmp/:>grep host1 myfile
10.1.1.123 host1
Say, for more complicated stuffs, like extracting 2nd line PLUS 5th to 7th line, i find it tough to code using the above commands.
h2. sed, perl or awk?
Do note that sed will transverse the entire file, hence if you have a very large file, this might take some time.
Say, we want to extract the 2nd line, we can use sed or awkmyserver:/tmp/:>sed 2p myfile
# IBM_PROLOG_BEGIN_TAG
# This is an automatically generated prolog.
# This is an automatically generated prolog.
#
# bos61D src/bos/usr/sbin/netstart/hosts 1.2
...
...myserver:/tmp/:>awk 'NR==2' myfile
# This is an automatically generated prolog.
If you have try it out, you will see that for sed, the 2nd line is indeed extracted but the rest of the file is also printed out! Use the following to disable printing out the old file.myserver:/tmp/:>sed -n 2p myfile
# This is an automatically generated prolog.
Alternatively, you might want to 'delete' whatever that you don't want by using the '!d' parameter.myserver:/tmp/:>sed '2!d' myfile
# This is an automatically generated prolog.
I wouldn't want to use this method as i have difficulty converting the line to use variables. Do give me suggestions or advice if you think otherwise. I don't claim to be expert in writing scripts. :)
IMPORTANT: Note that the single quotes are required. Else '!d' will bring back the last command you have executed with the letter 'd'.
If you only want one and only line from the file, you can get awk to exit after getting that line, otherwise the awk will transverse through the whole file.
myserver:/tmp/:>awk 'NR==6 {print; exit}' myfile
# Licensed Materials - Property of IBM
If we try to extract line 5 to 7 using sed or awk
myserver:/tmp/:>sed -n 5,7p myfile
#
# Licensed Materials - Property of IBM
#myserver:/tmp/:>awk 'NR==5,NR==7' myfile
#
# Licensed Materials - Property of IBM
#
Here's another trick that i read from Mr Google. If you want to extract every 5th line of a file starting from the top of the file, perl or awk does the job easily.
myserver:/tmp/:>perl -ne 'print unless (0 != $. % 5)' myfile
#
#
# IBM_PROLOG_END_TAG
#
# Licensed Materials - Property of IBM
# /etc/hosts
#
#
...
...
myserver:/tmp/:>awk '0 == NR % 5' myfile
#
#
# IBM_PROLOG_END_TAG
#
# Licensed Materials - Property of IBM
# /etc/hosts
#
#
...
...
Tip: If you don't want to start from the top of the file, you can put (NR + 1), which means to start from line 1.
Thats all folks.
Monday, October 01, 2012
My one liner to extracting lines using sed, perl or awk
Labels:
AIX,
LinuxAdmin,
Scripting,
SolarisAdmin
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment