I have a performance issue I need help with. Please bear with me for the explanation:
I have a database of known Car Vin# and years (only first 4 lines of ~5,000 shown for ease):
>vinDB
>ToyotaCarola 2008
IJDINJNDJIJKNDJIMKDK0897
NissanAltima 1998
LJIODJJNJDJNJDNJNJDJ7765
I also have a a .txt document that shows a unique DMV ID, a vin number, and a reference number in the following way (only 4 lines of ~55 million shown for ease):
>carFile
>#DMVcorrNumber33:1245638:563892:6378
IJDINJNDJIJKNDJIMKDK0897
+
VIN#IDref6388546
#DMVcorrNumber33:1245638:563892:6378
LJIODJJNJDJNJDNJNJDJ7765
+
VIN#IDref2453663
What I would like to do is scan every second line (the VIN#) from my 'vinDB' file against every fourth line (starting with line two) of my 'carFile' file for a perfect match. If the match exists, I would like to output the name of the car, and how many times it is seen in the 'carFile' file.
So basically, I need this:
Car Year NumTimesFound
ToyotaCarola 2008 238
NissanAltima 1998 1755
So far I have the following code, which works on a truncated 'carFile' file, but crashes my R program when I try it will all ~55 million lines:
VinCounter<-function(carFile, vinDB)
{
i=1 #index inner while loop
j=1 #index outer while loop
m=2 #index of vinDB, starts at '2' because first VIN# is on line 2
s=2 #index of carFile
count=0
while(j<=length(rownames(vinDB))/2) # VIN# is on every 2nd line in vinDB file
{
while(i<=length(rownames(carFile))/4)# VIN# is on every 4th line in carFile file
{
if(vinDB[m,1]==carFile[s,1])
{
count=count+1
s=s+4
}
else
{
s=s+4
}
i=i+1
}
print(vinDB[m-1,1])
print(count)
count=0
s=2
i=1
m=m+2
j=j+1
}
}
So, basically, I would like to figure out how to:
1) Make the code above quicked and more efficient.
2) How to have my output be stored in a .txt or .csv file (because right now, it just shows me the output on the screen).
Thanks!