Tags: awk, certain, column2, combine, content, dear, file, following, formatjunk, group, heading, junk, line, linux, multiline, multiple, p01column1, pattern, perl, programming, sed, segment, textca1001, unix

Combine multiple line segment into one, when certain pattern is found - awk/sed/perl

On Programmer » Unix & Linux

8,638 words with 9 Comments; publish: Sun, 18 May 2008 23:45:00 GMT; (20093.75, « »)

Dear Group,

I have a file with the content in the following format:

Junk...

Junk...

Heading P01

column1 column2 multiline text

CA1001 10 This is a multiline

text spanning two lines

CA1005 12 This is a multiline

text spanning three

lines

CA1008 11 This is a single line text

Heading P02

column1 column2

CA2001 10

CA2003 11

CA2005 12

Heading P03

Junk..

Junk..

I would like to list all the values under "Heading P01" for the same

column1 in a single line

CA1001 10 This is a multiline text spanning two lines

CA1005 12 This is a multiline text spanning three lines

CA1008 11 This is a single line text

Note: The column1 values will always have "CA" as the starting

character.

Appreciate your help in finding a solution using awk or perl or

sed ...

Thank you!!!!

All Comments

Leave a comment...

  • 9 Comments
    • On Oct 30, 3:19 pm, Michael Tosch <eed....unix-linux.todaysummary.com.NO.eed.SPAM.ericsson.PLS.se>

      wrote:

      > da. Ram wrote:

      >

      >

      >

      >

      >

      >

      >

      >

      >

      >

      >

      >

      > awk '/^Heading P01/{x=1} /^Heading P02/{x=0} x==0{next}

      > /^CA/,/^$/{printf "%s",$0}/^$/{print}' file

      > --

      > Michael Tosch .unix-linux.todaysummary.com. hp : com

      Thanks so much for the neat solution. Would it be possible to add the

      heading ID to the combined line?

      I tried the following, but the heading is getting added not just at

      the begining but for every section of the broken line.

      I am trying to figure out a way to get the heading id added once per

      combined line

      awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}/^CA/,/^$/

      {printf " %s %s",p,$0}/^$/{print}' file

      P01 CA1001 10 This is a multiline P01 text spanning

      two lines P01

      P01 CA1005 12 This is a multiline P01 text spanning

      three P01 lines P01

      P01 CA1008 11 This is a single line text P01

      Desired output

      P01 CA1001 10 This is a multiline text spanning two

      lines

      P01 CA1005 12 This is a multiline text spanning

      three lines

      P01 CA1008 11 This is a single line text

      BTW, what does the "print" at the end of the command do?

      #1; Sun, 18 May 2008 23:46:00 GMT
    • On Oct 30, 4:42 pm, Michael Tosch <eed....unix-linux.todaysummary.com.NO.eed.SPAM.ericsson.PLS.se>

      wrote:

      > da. Ram wrote:

      >

      >

      >

      >

      >

      >

      >

      >

      >

      > awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}

      > /^CA/{printf " %s ",p} /^CA/,/^$/{printf "%s",$0} /^$/{print}' file

      > The print at the end prints a newline character.

      > (More precise: it prints the current line with a newline, but the

      > current line is empty).

      > printf "%s" prints without a newline.

      > --

      > Michael Tosch .unix-linux.todaysummary.com. hp : com

      Thanks so much! The solution works great.

      #2; Sun, 18 May 2008 23:47:00 GMT
    • I'm looking to do something similar, but I'm not up to snuff enough on

      awk or perl to figure it out on my own yet.

      Anyway, I'm looking to match a particular string in a line, then grab

      this line

      and the next 5 after it so I can parse out some other parameters.

      Here's a sample of what

      I need to pull out of the file (matching on 'GHLR665', then pulling

      this line plus the next 5):

      1193616363 XXXXXX46D00 CM GHLR665 OCT28 18:59:54 7090 INFO

      Table GHLRVLR Resource Limitation

      1193616363 Operation: Update Location

      1193616363 VLR number: 551178313920

      1193616363 Description: Table GHLRVLR is about to

      reach its 6000 Maximum Capacity.

      1193616363 Space Left: 0 (6000)

      1193616363 Action: Use QVLRACT in HLRADMIN to

      identify the inactive VLRs.

      Every line in this log file starts with a 10 digit number (i.e.

      1193616363), which may or may not be the same value.

      The line before and after what I'm trying to capture and write into a

      single line / record will be just the 10 digit number,

      followed by some white space character and a carriage return (UNIX

      style, I think).

      Any suggestions would be very much appreciated!

      Mike

      #3; Sun, 18 May 2008 23:48:00 GMT
    • Ed,

      Thanks! I'm running into some silly syntax errors, but thanks for the

      explanation of the logic, that should get me going the right

      direction.

      Thanks Again,

      Mike

      #4; Sun, 18 May 2008 23:49:00 GMT
    • Michael,

      Thanks, you saved me some time and a little banging my head against

      the cubicle wall. Apparently the version of awk in Solaris 10 is an

      'old' awk -- the last of you examples is the one that did the trick.

      Regards and Thank You Again,

      Mike

      #5; Sun, 18 May 2008 23:50:00 GMT
    • On Nov 1, 2:59 am, Ed Morton <mor....unix-linux.todaysummary.com.lsupcaemnt.com> wrote:

      > On 10/31/2007 8:54 PM, Miguel Lobos wrote:

      >

      >

      >

      >

      > Absolutely do not do anything to accomodate old, broken awk on Solaris. Us

      e GNU

      > awk (gawk), New awk (nawk), or /usr/xpg4/bin/awk instead.

      > By the way, this is netnews not a web forum so you should leave enough con

      text

      > in each post so it stands alone.

      > Ed

      Ed,

      Thank you again for the advice, and all points taken! Now that I've

      managed to finish my report, I'll work on getting a more modern awk on

      my Ultra 45.

      Mike

      #6; Sun, 18 May 2008 23:51:00 GMT
    • On Nov 2, 10:41 am, Michael Tosch <eed....unix-linux.todaysummary.com.NO.eed.SPAM.ericsson.PLS.se>

      wrote:

      > Miguel Lobos wrote:

      >

      >

      >

      >

      >

      >

      > cd /usr/bin

      > ls -li awk nawk oawk

      > shows that awk linked to oawk.

      > rm awk

      > ln nawk awk

      > and it will be linked to nawk.

      > This is the path that AT&T had prepared

      > but Sun has never dared to go.

      > We all should open service cases with Sun and urge them for an RFE.

      > --

      > Michael Tosch .unix-linux.todaysummary.com. hp : com- Hide quoted text -

      > - Show quoted text -

      Michael,

      Excellent! I'll be updating my system on Monday morning, though I've

      considered going and grabbing gawk off of sunfreeware.com. Just to

      have something to fall back on, I'm probably going to rename rather

      than remove the original awk to something else. Its not that I'm

      afraid, but want to have a safety net if something else I was doing

      with the original awk breaks, until I get time to figure out how to

      make it work with nawk or gawk.

      Thanks again for all the wonderful suggestions, and helping me get on

      the right track with this.

      Regards,

      Mike

      #7; Sun, 18 May 2008 23:52:00 GMT
    • On Oct 30, 4:42 pm, Michael Tosch <eed....unix-linux.todaysummary.com.NO.eed.SPAM.ericsson.PLS.se>

      wrote:

      > da. Ram wrote:

      >

      >

      >

      >

      >

      >

      >

      >

      >

      > awk '/^Heading P01/{x=1;p=$2} /^Heading P02/{x=0} x==0{next}

      > /^CA/{printf " %s ",p} /^CA/,/^$/{printf "%s",$0} /^$/{print}' file

      > The print at the end prints a newline character.

      > (More precise: it prints the current line with a newline, but the

      > current line is empty).

      > printf "%s" prints without a newline.

      > --

      > Michael Tosch .unix-linux.todaysummary.com. hp : com

      Thanks for all your help. I have an additional requirement now. Is it

      possible to print only the 1st and last (text field) columns in

      addition to the heading ID.

      P01 CA1001 This is a multiline text spanning two lines

      P01 CA1005 This is a multiline text spanning three lines P01

      P01 CA1008 This is a single line text P01

      #8; Sun, 18 May 2008 23:53:00 GMT
    • On Nov 6, 6:57 am, Janis Papanagnou <Janis_Papanag....unix-linux.todaysummary.com.hotmail.com>

      wrote:

      > da. Ram wrote:

      > [snip]

      >

      >

      > awk '{print $1, $NF}'

      > will print the first and last field.

      >

      >

      > But how is the last field defined? From your example you seem to want

      > multiple fields that you want to extract. Is the larger space a <TAB>

      > delimiter? Do the last fields start from a certain column number? In

      > the latter case use

      > awk '{print $1,substr($0,54)}'

      > Janis

      Sorry, I missed the post earlier. Thanks for the suggestion and I will

      try the substr option.

      The last field is in fact a large text with spaces and tabs. They

      always start from a certain position and could span multiple lines

      until the next record is found. Initial chanleege was to get the

      multiple lines combined into one.

      Kind regards

      #9; Sun, 18 May 2008 23:54:00 GMT