0

I'm trying to merge multiple .csv files with data from one .xlsx file. Everything seems to be loaded in properly. The csv files I'm working with have a different number of rows for each file (as you'll see in the error message below). Here is the code I've written that has given me errors:

for loop with a merge

The lat and lon in the csv files match up with LATITUDE and LONGITUDE from the ridership excel file

This is the error I am getting:

error in attempted merge()

I'm trying to get the ridership data from the xlsx file to merge with the corresponding lat and lon data in each of the csv files. For note, the ridership xlsx file has all of the long/lat data found in the csvs but each csv only has one value for long lat that is the same for every row in it's column.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    (1) `merge` returns a new frame, but you're discarding all of the returns. Regardless of the error, this is a wasted operation. (2) You're creating the looping variable `i` but never use it, you are repeating the same operation over and over. (3) Ok, so you have an error, it could be caused by a number of things, but to troubleshoot, almost all of them require representative sample data. Are you able to find a _small_ subset of your data and share it with `dput`? (See https://stackoverflow.com/q/5963269 , [mcve], and https://stackoverflow.com/tags/r/info for `dput`, `data.frame`.) – r2evans Sep 01 '23 at 18:38
  • How many sensor do you have? are they close? I will first create an identifier based on a simplified version of lat-long and join on this identifier. If I am correct you have multiple `.csv` and one `.xlsx`. I assume you have on csv per sensor and an `.xlsx`describing all sensor. If I am right it is easier to append all the csv, then doing the join (with the merge). I could also have misunderstood your problem. – defuneste Sep 01 '23 at 18:46
  • Is `sensorlist` a `list` of data frames? `merge` doesn't expect lists of data frames as input. Do you mean to be doing `merge(sensorlist[[i]], RidershipJan)`? (And presumably storing the result?) – Gregor Thomas Sep 01 '23 at 18:56
  • @GregorThomas yeah `sensorlist` is a list of different data frames. Would changing the indices to `[[]]` instead of single `[]` tell merge these are data frames? Storing them doesn't seem to be too important, I really just need the data joined so I can do analysis in R further down the line (expecting to store/export that end result) – bigchungus81 Sep 01 '23 at 19:44
  • @defuneste correct there are 25 `.csv` (not for individual sensors, about 10 sensors total with 25 total deployments resulting in 25 `.csv` files) and only 1 `.xlsx` file. The lat/long data of the sensors are relatively close as it is all within one metro area, but a wide range of distribution throughout the city. The `.csv` files have air quality data and the lat/lon data. The `.xlsx` has bus ridership data and lat/lon. I'm most interested in getting these files together so I can eventually compare the ridership and air quality data points. – bigchungus81 Sep 01 '23 at 19:51
  • @r2evans (1) ah is there a way to not discard the returns? I'm trying to merge each data frame separately with the excel file (2) that makes sense, it's been some time since I've used R I changed both to the same variable `j` (I used `i` in a loop above reading the `.csv` files so I used `j` to avoid any potential issues. (3) I can potentially try this either later today or next week. The data I'm using isn't all mine, so I would want to get approval before sharing some of it. I added some extra details about the files in one of my other replies, but I'll see if I'm able to share some of this. – bigchungus81 Sep 01 '23 at 20:42
  • @bigchungus81 Assigning the return is a start, but since you're in a loop (even if it's perfectly redundant), you need to assign to the contents of a `list(.)`. I don't know what you're trying to do in the big picture, frankly, and this question is currently too unclear to help much more with. – r2evans Sep 01 '23 at 22:34
  • What is the desired output? Do you want a list of the 25 files, each one joined to `RidershipJan`, or do you want a single data frame of all 25 data frames combined into 1 and merged with `RidershipJan`? – Gregor Thomas Sep 02 '23 at 00:32
  • Also note that you don't have to share **real** data to make a reproducible example. If you gave us a 2-element `sensorlist` list, where each element was data frame with 2 rows of fake data, and a fake `RidershipJan` with just enough fake rows to correspond to the fake `sensorlist`, that would give us all we need to help you. Just make sure the fake data has the same basic structure as the real data (same data types). – Gregor Thomas Sep 02 '23 at 00:33

1 Answers1

0

As we started guessing in the comments:

I will divide your process in two steps (as a general rule when something is not working you should try to break it in smaller parts).

  1. Append all your csvs

I assume they all have the same number/types and order of columns (could be wrong because sometimes sensors used weird stuff to encode value) and let say they are all in the same folder/directory.

my_csv <- list.files("my_folder_path", pattern = "*.csv") 
a_list_of_my_censor <- lapply(my_csv, FUN = function(files){read.csv(files)}) 
a_big_data_set <- do.call("rbind", a_list_of_my_censor)
  1. Then you can go with your merge with a_big_data_set and the table from the xlsx. You do not need the for loop.
defuneste
  • 376
  • 2
  • 7