1) Reordering one csv file based on another file header and 2) Merging one column of one csv file to another and remove duplicate

Question

I have two csv file. Both files might have same or different data. File2 has only few columns from file 1. Some column in file 2 may have different header. eg File 2 has Name in place of First Name

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
grey07,2070,04ap67,lg2070,Laura,Grey,Depot,London
johnson81,4081,30no86,cj4081,Craig,Johnson,Depot,London
jenkins46,9346,14ju73,mj9346,Mary,Jenkins,Engineering,Manchester
smith79,5079,09ja61,js5079,Jamie,Smith,Engineering,Manchester

File 2

Department,First name,Last name,One-time password
Sales,Rachel,Booker,12se74
Depot,Laura,Grey,04ap67
Depot,Craig,Johnson,30no86
Engineering,Mary,Jenkins,14ju73
Engineering,Jamie,Smith,09ja61

Problem 1: I wanted to reorder columns in csv file 2 to match order in csv file 1 based on headers.

Desired output header sorted based on file 1

One-time password,Name,Last name,Department
12se74,Dash,Bok,Sales
04ap67,Claire,Trans,Accounts
30no86,Shane,Walter,Depot
14ju73,Leon,Jenkins,Engineering
09ja61,Oliver,Den,Engineering

Problem 2: Merge File2 columns to File 1 based on header removing duplicates Eg if First Name , Last Name, Department columns are same then those are duplicates remove those dupes. Other columns may or may not be same. Hence wanting to achive condition based deduplication of record

Problem 3: Convert file 2 as file 1 template add missing columns in order. Finally compare and delete duplicate columns based on certain headers. eg. If First Name, Last Name and password are same then they are duplicates other columns may be same or different.

Problem 4: Copy specific column from file 2 to file 1 preserving order Eg. File 2 has Name column Replace that column with First Name column of file 1

Tried:

awk -v FS=, -v OFS=, 'FNR==NR{hash[FNR]=$5; next}{$2 = hash[FNR]}1' file file2

Above answer is from https://unix.stackexchange.com/questions/674038/replace-a-column-value-in-csv-file-from-another-file

Above seem to work. But it requires to specify column numbet as $5 and $2. Can anyone help modify above command to specify header instead of column number

Tried

awk -v FS=, -v OFS=, '{ for (i=1;i<=NF;i++) { if (i=="name") var=$i }; FNR==NR{hash[FNR]=$5; next}{$var = hash[FNR] }' file file2

Not working

Don't ask multiple questions ("Problems") in 1 question, just ask 1 question then see if you can adapt the answer you get for that before asking any followup questions **if necessary**. Also make sure to include exactly the expected output for your posted sample input. — Ed Morton, Jan 28 '22 at 13:55

Fravadona · Accepted Answer · 2022-02-01T23:06:32.693

You can do it all quite easily with Miller, which is available here as a static binary. Put the mlr executable somewhere in your PATH and you're done with the installation.

For starters, I'll assume that we're working with two CSV files with no inconsistency in the column names:

cat file1.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester

cat file2.csv

Department,First name,Last name,One-time password
Engineering,Oliver,Den,09ja61
Sales,Rachel,Booker,12se74

rename a specified field:

mlr --csv rename 'First name,Name' file2.csv

Department,Name,Last name,One-time password
Engineering,Oliver,Den,09ja61
Sales,Rachel,Booker,12se74

re-order the columns of `file2.csv` based on the header of `file1.csv`:

mlr --csv reorder -f "$(head -n 1 file1.csv)" file2.csv

One-time password,First name,Last name,Department
09ja61,Oliver,Den,Engineering
12se74,Rachel,Booker,Sales

add missing columns in `file2.csv` based on the header of `file1.csv`:

mlr --csv template -t file1.csv file2.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
,,09ja61,,Oliver,Den,Engineering,
,,12se74,,Rachel,Booker,Sales,

remove duplicates based on `One-time password,First name,Last name` unicity:

mlr --csv head -n 1 -g 'One-time password,First name,Last name' fileX.csv

concatenate `file1.csv` and `file2.csv`:

mlr --csv unsparsify file1.csv file2.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
,,09ja61,,Oliver,Den,Engineering,
,,12se74,,Rachel,Booker,Sales,

concatenate `file1.csv` and `file2.csv` and remove the duplicates based on `One-time password,First name,Last name` unicity:

The command is composed of a chain of operations.

mlr --csv unsparsify then head -n 1 -g 'One-time password,First name,Last name' file1.csv file2.csv

Username, Identifier,One-time password,Recovery code,First name,Last name,Department,Location
booker12,9012,12se74,rb9012,Rachel,Booker,Sales,Manchester
,,09ja61,,Oliver,Den,Engineering,

Lastly, let's suppose that the column First name is called Name in file2.csv and that you want to concatenate file1.csv and file2.csv and remove the duplicates based on One-time password,First name,Last name unicity.

You can do it by prepending a rename operation to the previous command:

mlr --csv rename 'Name,First name' then unsparsify then head -n 1 -g 'One-time password,First name,Last name' file1.csv file2.csv

Thanks. For Problem 4 eg. copy 5nd column of file1 to 2nd column of file 1. First Name column of file 1 to Name column of second file and vice versa — Haru Suzuki, Jan 28 '22 at 18:13
Wouldn't that be just changing the column name in the header? — Fravadona, Jan 28 '22 at 18:22
Sir. Using unsparsify and implode i wanted to merge two csv files and then remove all duplicates. But duplicates doesny measn all columns have same value and can be removed with sort -u. Instead if records have certain columns withsame values then they are considerd duplicates and can be trimmed to single entry — Haru Suzuki, Jan 28 '22 at 19:36

1) Reordering one csv file based on another file header and 2) Merging one column of one csv file to another and remove duplicate

1 Answers1

rename a specified field:

re-order the columns of file2.csv based on the header of file1.csv:

add missing columns in file2.csv based on the header of file1.csv:

remove duplicates based on One-time password,First name,Last name unicity:

concatenate file1.csv and file2.csv:

concatenate file1.csv and file2.csv and remove the duplicates based on One-time password,First name,Last name unicity:

re-order the columns of `file2.csv` based on the header of `file1.csv`:

add missing columns in `file2.csv` based on the header of `file1.csv`:

remove duplicates based on `One-time password,First name,Last name` unicity:

concatenate `file1.csv` and `file2.csv`:

concatenate `file1.csv` and `file2.csv` and remove the duplicates based on `One-time password,First name,Last name` unicity: