0

I need help with extracting dates and other useful parameters from data entries that I made in Google Sheets for hand-written sample labels (paleoecological samples from South America). Some of the samples were labeled right where they were collected. Altogether, the labels were made by different people from different research teams through more than 50 years and there's very little consistency in my labels. The date extraction formulae in Google Sheets did not work for me, probably for that reason. Is there a smarter way to do it (R, Python, Wolfram etc.)? Thank you so much for any advice! There are a few thousands of these, and if I could figure out how to streamline this type of work, it could probably save months of my life, not to mention help save the collection itself!

Original sample label text                                      Collection date        Core depth
Lot #1, E.J. 1#3, 222-227 cm, 8 cc, 8 May '68; 29             May-8-1968           222-227 cm
Lot #1, E.J. 1#4, 255-260 cm, 8 cc, 16 May '68; 30            May-16-1968          255-260 cm
Lot #1, E.J. 1#2, Interval 105-110 cm, 8 cc, 2 May '68; 19        May-2-1968           ...
Flamingo pond, Floreana, 7-77, D; Becker Q, Fl. Pond, Flor.       Jun - 1977
Sergio, 23-12-91, MWI, 1 cc, 40-42 cm;                            ...                  ...

Google sheets' Regextract, DateValue functions. I would not expect these to work for the entire spreadsheet anyways, since there are so many different labeling formats.

jsotola
  • 2,238
  • 1
  • 10
  • 22

1 Answers1

0

When posting questions on StackOverflow, be specific with your question. Your assignment seems to have many formats, and thus not consistent. With any programming challenge, there needs to be a pattern identified. If you have multiple formats, you could use IF/ELSE statements, CASE statements, WHILE statements, use parsing techniques like you are using (Regular Expressions).

Google Sheets should be able to handle every case you have as long as your code is correct. It appears you have no code in your question, so either you put the collection date in manually and have not even tried to use the function because you got overwhelmed by all the formats in the sheet, or you are hiding your code. Please put the code into your question after trying first. Otherwise your questions will get closed. I'll answer your question this time very broadly speaking, but realize in the future that you will get yelled at for not following Stackoverflow guidelines for questions.

My preference would be to use Python and a library called openpyxl to read an xlsx file. Then use regular expressions in Python. You just need a loop and a nested loop, with a CASE statement of regular expressions.

Google Sheets will handle this however. I would do this in two steps. Have a new column for each type of format and just look for a specific regular expression. Then have another column for that specific format after transformation.

Format 1 pre   Format 1      Format 2 pre   Format 2    Format 3 pre    Format 3
8 May '68      05/08/1968     
16 May '68     05/16/1968
2 May '68      05/02/1968
                              7-77          07/1977           
                                                         23-12-91        12/23/1991

Check if string has date, any format

JustBeingHelpful
  • 18,332
  • 38
  • 160
  • 245