I have a created a table "SQL_table" in SQL and copied contents from various existing .csv files to "SQL_table" in R using the code given below.
Code:
library(RPostgres)
library(DBI)
#Step 2: make a dataframe of all files
file_names1 <- dir("D:/Data/", full.names = TRUE, recursive = T) #where you have your files
my_data_frame <- do.call(rbind,lapply(file_names1,read.csv))
#Step 3: Establish R & PostgreSQL Connection using RPostgres
dsn_database = "...." # Specify the name of your Database
dsn_hostname = "localhost" # Specify host name
dsn_port = "...." # Specify your port number. e.g. 98939
dsn_uid = "...." # Specify your username. e.g. "admin"
dsn_pwd = "...." # Specify your password. e.g. "xxx"
tryCatch({
drv <- dbDriver("Postgres")
print("Connecting to Database…")
connec <- dbConnect(drv,
dbname = dsn_database,
host = dsn_hostname,
port = dsn_port,
user = dsn_uid,
password = dsn_pwd)
print("Database Connected!")
},
error=function(cond) {
print("Unable to connect to Database.")
})
#Step 4: Run using RPostgres
dbWriteTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbAppendTable(connec, "SQL_table", my_data_frame, create = TRUE)
dbReadTable(connec, "SQL_table")
The structure of my SQL_table is as follows:
Y_m_d_Time Location_ID Popul Literacy Mortality
2001-01-01 00:00:00 NAM 5000 77 8.8
2002-01-01 00:00:00 NAM 8700 58 7.7
2003-01-01 00:00:00 NAM 3410 98 9.8
..
..
1990-01-01 00:00:00 KEN 2000 87 9.8
1991-01-01 00:00:00 KEN 3200 88 9.7
1991-01-01 00:00:00 KEN 4910 78 8.8
..
..
2010-01-01 00:00:00 PUB 6000 97 9.8
2011-01-01 00:00:00 PUB 4200 89 9.7
2012-01-01 00:00:00 PUB 5910 88 8.8
..
..
..
I have another .csv file "CSV_1" and I want to copy few columns (Latitude, Longitude, Altitude, Start_Date, End_Date) from "CSV_1" to "SQL_table" using R. The common column between both the tables is Location_ID in "SQL_table" and City in "CSV_1".
The structure of CSV file is as follows:
City Latitude Longitude Altitude Start_Date End_Date No. of Events Event_Type
NAM 35 79 218 3/1/2001 10 Flood
KEN 30 81 129 2/1/1990 5/31/1999 5 Earthquake
PUB 22 76 220 1/1/2010 11 Landslide
..
..
My desired output is as follows:
Y_m_d_Time Location_ID Popul Literacy Mortality Latitude Longitude Altitude Start_Date End_Date
2001-01-01 00:00:00 NAM 5000 77 8.8 35 79 218 3/1/2001
2002-01-01 00:00:00 NAM 8700 58 7.7 35 79 218 3/1/2001
2003-01-01 00:00:00 NAM 3410 98 9.8 35 79 218 3/1/2001
..
..
1990-01-01 00:00:00 KEN 2000 87 9.8 30 81 129 2/1/1990 5/31/1999
1991-01-01 00:00:00 KEN 3200 88 9.7 30 81 129 2/1/1990 5/31/1999
1991-01-01 00:00:00 KEN 4910 78 8.8 30 81 129 2/1/1990 5/31/1999
..
..
2010-01-01 00:00:00 PUB 6000 97 9.8 22 76 220 1/1/2010
2011-01-01 00:00:00 PUB 4200 89 9.7 22 76 220 1/1/2010
2012-01-01 00:00:00 PUB 5910 88 8.8 22 76 220 1/1/2010
..
..
..
Could anyone please help me how I can extend my code in R to obtain the desired results.