0

There are two files. If the ID number matches both files, then I want only the value 1 and value 2 from File2.txt , Please let me know if my question is unclear

File1.txt


ID Number   Value 1     Value 2     Country 
0001        23            55        Spain
0231        15            23        USA     
4213        10            11        Canada
7541        32            29        Italy

File2.txt

0001        5       6
0231        7       18
4213        54      87
5554        12      10
1111        31      13
6422        66      51

The output should look like this.

ID Number   Value 1     Value 2     Country 
0001          5           6         Spain
0231          7          18         USA     
4213          54         87         Canada
7541          32         29         Italy

New example:

File3.txt

#ID CAT CHN LC SC LATITUDE LONGITUDE 
20022 CX 21 --   4  32.739000  -114.635700 
01711 CX 21 --   3  32.779700  -115.567500
08433 CX 21 --   2  31.919930  -123.321000


File4.txt

20022,32.45,-114.88
01192,32.839,-115.487
01711,32.88,-115.45
01218,32.717,-115.637

output
#ID CAT CHN LC SC LATITUDE LONGITUDE 
20022 CX 21 --   4  32.45  -114.88 
01711 CX 21 --   3  32.88  -115.45
08433 CX 21 --   2  31.919930  -123.321000

Code I got so far

f = open("File3.txt", "r") 
x= open("File4.txt","r")

df1 = pd.read_csv(f, sep=' ', engine='python')
df2 = pd.read_csv(x, sep=' ', header=None, engine='python')

df2 = df2.set_index(0).rename_axis("#ID")
df2 = df2.rename(columns={5:'LATITUDE', 6: 'LONGITUDE'})
df1 = df1.set_index('#ID')
df1.update(df2)
print(df1)
itsyoyo
  • 11
  • 5
  • What have you tried so far? – MattDMo May 28 '20 at 20:11
  • 1
    Please specify the way in which the data is stored in the text files. Also, you should show what you have tried to prove you attempted it. – William Clavier May 28 '20 at 20:12
  • Sorry, I should have posted my current code. I haven't done much because I wasnt sure what was the best method to handle this situation ( ex using pandas or csv) – itsyoyo May 28 '20 at 21:51

2 Answers2

1

Something like this, possibly:

file1_data = []
file1_headers = []

with open("File1.txt") as file1:
    for line in file1:
        file1_data.append(line.strip().split("\t"))
    file1_headers = file1_data[0]
    del file1_data[0]

file2_data = []

with open("File2.txt") as file2:
    for line in file2:
        file2_data.append(line.strip().split("\t"))

file2_ids = [x[0] for x in file2_data]

final_data = [file1_headers] + file1_data

for i in range(1, len(final_data)):
    if final_data[i][0] in file2_ids:
        match = [x for x in file2_data if x[0] == final_data[i][0]]
        final_data[i] = [match[0] + [final_data[i][3]]]

with open("output.txt", "w") as output:
    output.writelines(["\t".join(x) for x in final_data])

final_data becomes an alias of file1_data and then is selectively replacing rows with matching id's in file2_data, but keeping the country.

notacorn
  • 3,526
  • 4
  • 30
  • 60
  • The code works. However, Italy is not printing? even though Theres no Id number for Italy in the second file, Italy should still print its original values. Also How would I make it print it as a table? instead of a whole line. Thank you. – itsyoyo May 29 '20 at 17:17
  • italy isnt printing because you said the ids needed to match, which is how the logic in my answer was written out – notacorn May 29 '20 at 17:23
  • Sorry I should have been more clear. Basically I meant that replace the values only if the IDs match. If they dont match it should still use the same values from the first file. – itsyoyo May 29 '20 at 18:29
  • Im getting this error: final_data[i] = [match[0] + final_data[i][3]] TypeError: can only concatenate list (not "str") to list – itsyoyo May 29 '20 at 19:23
  • i wrapped the country name in its own list now – notacorn May 29 '20 at 19:28
  • Could you take alook at the new example I added. Im trying to implement your code with the new example. Could you tell me what needs to be changed? Thanks @notacorn – itsyoyo Jun 02 '20 at 18:12
0

Okay, what you need to do here is to get the indexes to match in both dataframes after importing. This is important because pandas use data alignment based on indexes.

Here is a complete example using your data:

from io import StringIO
import pandas as pd

File1txt=StringIO("""ID Number   Value 1     Value 2     Country 
0001        23            55        Spain
0231        15            23        USA     
4213        10            11        Canada
7541        32            29        Italy""")


File2txt = StringIO("""0001        5       6
0231        7       18
4213        54      87
5554        12      10
1111        31      13
6422        66      51""")

df1 = pd.read_csv(File1txt, sep='\s\s+', engine='python')
df2 = pd.read_csv(File2txt, sep='\s\s+', header=None, engine='python')

print(df1)
#    ID Number  Value 1  Value 2 Country
# 0          1       23       55   Spain
# 1        231       15       23     USA
# 2       4213       10       11  Canada
# 3       7541       32       29   Italy

print(df2)
#       0   1   2
# 0     1   5   6
# 1   231   7  18
# 2  4213  54  87
# 3  5554  12  10
# 4  1111  31  13
# 5  6422  66  51

df2 = df2.set_index(0).rename_axis('ID Number')
df2 = df2.rename(columns={1:'Value 1', 2: 'Value 2'})
df1 = df1.set_index('ID Number')
df1.update(df2)
print(df1.reset_index())

Output:

   ID Number  Value 1  Value 2 Country
0          1      5.0      6.0   Spain
1        231      7.0     18.0     USA
2       4213     54.0     87.0  Canada
3       7541     32.0     29.0   Italy
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • instead of using StringIO and having to put the values manually. Could I grab the values automatically from 2 files? by using open file – itsyoyo May 28 '20 at 20:42
  • Sure, but using pd.read_csv(...) is pretty easy. I am using StringIO here to simulate using a filepath. – Scott Boston May 29 '20 at 00:58
  • Does this scenario only works for a csv file? What if its a text file – itsyoyo May 29 '20 at 17:06
  • It works with a text file also, as long as your delimiter are fixed. Here I am using if you have value seperated by two or more spaces consider this a column. (https://stackoverflow.com/q/21546739/6361531) – Scott Boston May 29 '20 at 17:22
  • Hi, I added a new example, Could you please take a look at it. I want the Latitude and longitude values from File4.txt only if the #ID matches. Thank you. – itsyoyo Jun 01 '20 at 21:30
  • File4, need `sep=','` – Scott Boston Jun 01 '20 at 21:39
  • i changed that, Im still getting errors. is this line correct? df2 = df2.rename(columns={5:'LATITUDE', 6: 'LONGITUDE'}) – itsyoyo Jun 01 '20 at 21:50