How to Convert a tab delimited file with commas in values to .CSV and the
values with commas to be enclosed in double quotes?
I have a .CSV file (Lets say tab_delimited_file.csv) that I download from
a web portal of a particular vendor. When I moved the file to one of my
Linux directories, I noticed that this particular .CSV file is actually a
tab delimited file which is named as .CSV. Please find below few sample
records of the file.
"""column1""" """column2""" """column3""" """column4"""
"""column5""" """column6""" """column7"""
12 455 string with quotes, and with a comma in between 4432 6787
890 88
4432 6787 another, string with quotes, and with two comma in between
890 88 12 455
11 22 simple string 77 777 333 22
The above sample records are separated by tabs. I know the header of the
file is very weird but this is the way I received the file format to be.
I tried to use tr command to replace the tabs with commas but the file
gets messed up completely due to the extra commas in the record values. I
need the record values with commas in them to be enclosed in double
quotes. The command I used is as below.
tr '\t' ',' < tab_delimited_file.csv > comma_separated_file.csv
This converts the file into the below format.
"""column1""","""column2""","""column3""","""column4""","""column5""","""column6""","""column7"""
12,455,string with quotes, and with a comma in between,4432,6787,890,88
4432,6787,another, string with quotes, and with two comma in
between,890,88,12,455
11,22,simple string,77,777,333,22
I need help to convert the sample file into the below format.
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in
between",890,88,12,455
11,22,"simple string",77,777,333,22
Any solution in either using sed or awk will be very useful.
No comments:
Post a Comment