0

I have:

$ cat file1.csv (tab delimited)
R923E06 273911 2990492 2970203 F Resistant 
R923F06 273910 2990492 2970203 F Resistant 
R923H02 273894 2970600 2990171 M Resistant

and:

$ cat file2.txt (space delimited and it's a large file)
R923E06 CC GG TT AA ...
R923F06 GG TT AA CC ...
R923H02 TT GG CC AA ...

How can I replace of first column in file2.txt with all of 6 column in file1.csv?

Joaquin
  • 2,013
  • 3
  • 14
  • 26
mary
  • 59
  • 6

3 Answers3

2

Using join you can do this:

join   <(sed -e 's/\t/ /g' file1.csv) <(cat file2.txt)

sed to change tabs to space

join to joining lines of two files on a common field.

Output:

R923E06 273911 2990492 2970203 F Resistant  CC GG TT AA ...
R923F06 273910 2990492 2970203 F Resistant  GG TT AA CC ...
R923H02 273894 2970600 2990171 M Resistant TT GG CC AA ...
Joaquin
  • 2,013
  • 3
  • 14
  • 26
  • the order of two file (for plate number, i.e R923E06) is not same, dose joint attended to this note? other things that I forget to write is that, after add 6 column to file2.txt I have to separate genotype too (i.e R923E06 273911 2990492 2970203 F Resistant C C G G T T A A ...) – mary Jul 19 '18 at 06:53
  • If not sorted you can try this: `join <(sed -e 's/\t/ /g' file1.csv | sort -k 1) <(sort -k 1 file2.txt)` `sort` is sorting by 1st field. – Joaquin Jul 19 '18 at 07:16
  • your command work for above data but when I use for similar data it dosnt work. first line of two file is : cat sample1.txt > 1 1011001 1001164 981328 1 -9 and cat sample2.txt> 1011001 A G G G G G C C A A ... I want 6 column of sample1.txt replace on first column of sample2.txt – mary Jul 21 '18 at 20:00
  • You could try this: `join -1 2 -2 1 <(sed -e 's/\t/ /g' sample1.txt | sort -k 1) <(sort -k 1 sample2.txt)`. It did'nt work because you need to specify, in this case, columns to join. – Joaquin Jul 22 '18 at 16:10
  • Hi, to change order you could use: `join -1 2 -2 1 <(sed -e 's / \ t / / g' sample1.txt | sort -k 1) <(sort -k 1 sample2.txt ) | awk '{t = $ 1; $ 1 = $ 2; $ 2 = t;} 1'`. On the other hand, if you find answer as correct for this question, you can mark it as that. – Joaquin Jul 23 '18 at 05:51
  • 1
    For some reason the command was pasted with a weird format, the correct command is: `join -1 2 -2 1 <(sed -e 's/\t/ /g' sample1.txt | sort -k 1) <(sort -k 1 sample2.txt) | awk '{t = $1; $1 = $2 ; $2 = t;}1'` – Joaquin Jul 23 '18 at 06:04
  • please let me have another command without awk beacuse I have limitation for col number in awk, as this error: awk: program limit exceeded: maximum number of fields size=32767 FILENAME="-" FNR=1 NR=1 – mary Jul 23 '18 at 06:19
  • In this case, you can use: `join -1 2 -2 1 -o 1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 <(sed -e 's/\t/ /g' sample1.txt | sort -k 1) <(sort -k 1 sample2.txt)` – Joaquin Jul 23 '18 at 06:21
  • thanks but I have more that 54000 column. its not work. – mary Jul 23 '18 at 13:38
  • 1
    @mary remember to accept the answer if you found it helpful. Thanks – eLRuLL Jul 25 '18 at 00:41
0

Take a look at this AWK example:

awk 'FNR == NR { d[$1] = $0; next } { $1 = d[$1] } 1' file1.csv file2.txt

Here I replace first column in file2.txt with corresponding line (6 columns) of file1.csv.

Output:

R923E06 273911 2990492 2970203 F Resistant  CC GG TT AA ...
R923F06 273910 2990492 2970203 F Resistant  GG TT AA CC ...
R923H02 273894 2970600 2990171 M Resistant  TT GG CC AA ...

If you want everything tab-separated in the result, you can add gsub(/[[:space:]]/,"\t") to replace any space or tab with tab:

awk 'FNR == NR { d[$1] = $0; next } { $1 = d[$1]; gsub(/[[:space:]]/,"\t") } 1' file1.csv file2.txt
Andriy Makukha
  • 7,580
  • 1
  • 38
  • 49
  • I use your awk but it got me below error " awk: program limit exceeded: maximum number of fields size=32767 FILENAME="rom_ped2" FNR=1 NR=178", I try to install and use gawk but I use Ubuntu 12.04 I think the package I am looking for doesn't existand. so I am looking Python script to do that dose any body have sloution? – mary Jul 19 '18 at 06:48
  • @mary, well, that's very dated Ubuntu version... You can take a look at this: https://askubuntu.com/questions/244268/installing-gawk-4-0-on-ubuntu-12-04 – Andriy Makukha Jul 19 '18 at 08:18
0
#import pandas
import pandas as pd

#read file1.csv
#set index_col as false if file has delimiters at the end
file1 = pd.read_csv( 'file1.csv', ' ', index_col = False, names = 
['1','2','3','4','5','6']);

#read file2.txt, read_csv can read txt files as well
#set index_col as false if file has delimiters at the end
file2 = pd.read_csv( 'file2.csv', ' ', index_col = False, names = 
['1','2','3','4','5']);

#drop first column
file2.drop( '1', axis = 1, inplace = True )

#concat both frames
final = pd.concat([file1, file2], axis = 1)
#you might end up with mixed column names you can change it by using 
final.columns = ['col1', 'col2', ....]


#save as csv
final.to_csv('out.csv',sep='\t')
Shyam
  • 1
  • 1
  • 5