Bash Command To Do a Column Merge of Two csv files With Different Time-Series Rows

Question

I have two csv files with Time-Series data, and I want to do a column merge on them using bash. The merge part is simple paste -d , file1.csv file2.csv > combined.csv produces a merged file with the extra columns.

The problem is that the time series data from each file doesn't line up and I want to be able to handle that situation. So the merged rows are aligned by the timestamp in column A.

This same problem is described in this SO Question. But that was related to the R programming language, so the answer doesn't work for bash.

# File 1                                      # File 2
| Time                 | Datapoint |          | Time                 | Datapoint |
| 2021-04-01T00:00:00Z | 43        |          | 2021-04-01T00:00:05Z | 51        |
| 2021-04-01T00:00:01Z | 44        |          | 2021-04-01T00:00:10Z | 52        |
| 2021-04-01T00:00:02Z | 45        |          | 2021-04-01T00:00:15Z | 53        |
| 2021-04-01T00:00:03Z | 46        |          | 2021-04-01T00:00:20Z | 54        |
| 2021-04-01T00:00:04Z | 47        |          | 2021-04-01T00:00:25Z | 55        |
| 2021-04-01T00:00:05Z | 48        |          | 2021-04-01T00:00:30Z | 56        |
| 2021-04-01T00:00:06Z | 49        |          | 2021-04-01T00:00:35Z | 57        |

# Desired File
| Time                 | Datapoint | Datapoint |
| 2021-04-01T00:00:00Z | 43        |           |
| 2021-04-01T00:00:01Z | 44        |           |
| 2021-04-01T00:00:02Z | 45        |           |
| 2021-04-01T00:00:03Z | 46        |           |
| 2021-04-01T00:00:04Z | 47        |           |
| 2021-04-01T00:00:05Z | 48        | 51        |
| 2021-04-01T00:00:06Z | 49        |           |

I know I can write a script to read both files, and write the data related to each timestamp seperately. But I wondered if there was another way of doing this using bash utilities?

`join -t, -a1 -j1 file1 file2` should work if your files are actual CSVs (i.e, fields are separated by commas), and timestamps are sorted. — oguz ismail, May 24 '21 at 13:14

score 2 · Accepted Answer · answered May 24 '21 at 13:23

Use join. This requires sorted files, but from the looks of it your files are already sorted.

join --header -t, -j1 -a1 file1 file2 prints

Time                ,Datapoint,Datapoint
2021-04-01T00:00:00Z,43
2021-04-01T00:00:01Z,44
2021-04-01T00:00:02Z,45
2021-04-01T00:00:03Z,46
2021-04-01T00:00:04Z,47
2021-04-01T00:00:05Z,48       ,51
2021-04-01T00:00:06Z,49

Bash Command To Do a Column Merge of Two csv files With Different Time-Series Rows

1 Answers1