How to Remove Pipe character from a data filed in a pipe delimited file

Experts, i have a simple pipe delimited file from source system which has a free flow text field and for one of the records, i see that "|" character is coming in as part of data. This is breaking my file unevenly and not getting parsed in to correct number of fields. I want to replace the "|" in the data field with a "#".

Record coming in from source system. There are total 9 fields in the file.

OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow|Text|20191029|X|X|X|3456 

If you Notice the 4th field – Free"flow|Text , this is complete value from source which has a pipe in it. i want to change it to- Free"flow#Text and then read the file with a pipe delimiter.

Desired Outcome-

OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow#Text|20191029|X|X|X|3456 

I tried few awk/sed combinations, but didn’t get the desired output.

Thanks

Add Comment
2 Answer(s)

Since you know there are 9 fields, and the 4th is a problem: take the first 3 fields and the last 5 fields and whatever is left over is the 4th field.

You did tag , so here’s some bash: I’m sure the python equivalent is close:

line='OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow|Text|20191029|X|X|X|3456' IFS='|'  read -ra fields <<<"$line" first3=( "${fields[@]:0:3}" ) last5=( "${fields[@]: -5}" )  tmp=${line#"${first3[*]}$IFS"}   # remove the first 3 joined with pipe field4=${tmp%"$IFS${last5[*]}"}  # remove the last 5 joined with pipe data=( "${first3[@]}" "$field4" "${last5[@]}" )  newline="${first3[*]}$IFS${field4//$IFS/#}$IFS${last5[*]}" # .......^^^^^^^^^^^^....^^^^^^^^^^^^^^^^^....^^^^^^^^^^^ printf "%s\n" "$line" "$newline" 
OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow|Text|20191029|X|X|X|3456 OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow#Text|20191029|X|X|X|3456 

with awk, it’s simpler: If there are 10 fields, join fields 4 and 5, and shift the rest down one.

echo "$line" | awk '     BEGIN { FS = OFS = "|" }     NF == 10 {         $4 = $4 "#" $5         for (i=5; i<NF; i++)             $i = $(i+1)         NF--     }     1 ' 
OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow#Text|20191029|X|X|X|3456 
Add Comment

You tagged your question with Python so I assume a Python-based answer is acceptable. I assume not all records in your file have the additional "|" in it, but only some records have the "|" in the free text column. For a more realistic example, I create an input with some correct records and some erroneous records.

I use StringIO to simulate the file, in your environment read the real file with ‘open’.

from io import StringIO  sample = 'OutboundManualCall|H|RTYEHLA HTREDFST|Free"flow|Text|20191029|X|X|X|3456\nOutboundManualCall|J|LALALA HTREDFST|FreeHalalText|20191029|X|X|X|3456\nOutboundManualCall|J|LALALA HTREDFST|FrulaalText|20191029|X|X|X|3456\nOutboundManualCall|H|RTYEHLA HTREDFST|Free"flow|Text|20191029|X|X|X|3456'      infile = StringIO(sample) outfile = StringIO()  for line in infile.readlines():     cols = line.split("|")     if len(cols) > 9:         print(f"bad colum {cols[3:5]}")         line = "|".join(cols[:3]) + "#".join(cols[3:5]) + "|".join(cols[5:])     outfile.write(line) print("Corrected file:")  print(outfile.getvalue()) 

Results in:

> bad colum ['Free"flow', 'Text'] > bad colum ['Free"flow', 'Text'] > Corrected file: > OutboundManualCall|H|RTYEHLA HTREDFSTFree"flow#Text20191029|X|X|X|3456 > OutboundManualCall|J|LALALA HTREDFST|FreeHalalText|20191029|X|X|X|3456 > OutboundManualCall|J|LALALA HTREDFST|FrulaalText|20191029|X|X|X|3456 > OutboundManualCall|H|RTYEHLA HTREDFSTFree"flow#Text20191029|X|X|X|3456 
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.