1

I have 2 tab delimited (I replaced the tabs with → below) lines as in:

Line1Word1→Line1 Words2→→Line1Word3→→→Line1 Words4
→→Line2Word1→→Line2 Words2→→

Expected result

Line1Word1→Line1 Words2→Line2Word1→Line1Word3→Line2 Words2→→Line1 Words4

It's easy to see what the result should be, by copying the 3 lines in Excel

Display in Excel

Line1

Line1Word1  Line1 Words2        Line1Word3          Line1 Words4

For this line I got

^(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)\t(.*?)$

which will get the Groups 1, 2, 4 and 7. However I believe there must be a more generic way to obtain these that will account for any amount of groups.

Line2

Line2Words1     Line2 Words2        

I could do the same here for Line 2 as above. Still need help on a more elegant way to get the groups, if I do not know how many to expect or where they are located.

RESULT

Line1Word1  Line1 Words2    Line2Words1 Line1Word3  Line2 Words2        Line1 Words4

Here I have no idea on how to combine the Groups from the 2 lines above as in:

 \1(from Line1)\t\2(from Line1)\t\1(from Line2)\t\4(from Line1)...

I used regex sparingly over the years, but everything I tried for this got me nowhere. Any help will be greatly appreciated.

NOTE in response to Tripleee:

Data is formatted as follows:

Instead of Line1 and Line2, we will call them Array1 and Array2, which will contain multiple Lines, rows as described above. Both Arrays will have the same amount of rows

As in the example:

Array1 could have Indexes 1, 2, 4 and 7 only, with data in each row

Array2 could have Indexes 3 and 5 only, with data in each row

No index will have data in both arrays in any row

However, arrays could have data in different indexes every time the script runs, with more or less indexes every time

A variable containing ALL data, separated by |, can be created as in:

Row1Array1 | Row1Array2
Row2Array1 | Row2Array2
Row3Array1 | Row3Array2
...

Or data can be arranged in any other way that will help the use of regex.

tripleee
  • 175,061
  • 34
  • 275
  • 318
fionpo
  • 141
  • 1
  • 10
  • This is not a general description of a more complicated needs. This is the actual problem. I got as far a \t(.*?)\t to capture groups. Of course this does not capture the first or last text strings, since there are no tabs on both ends. Besides this i have no idea how to extract the text from Line2 and replace with for example \1(from Line1)\2(from Line1)\1(from Line2)\3(from Line1)... Any ideas? – fionpo Jan 18 '19 at 04:37
  • Thanks Tiw, I have modified the original post to add more details. Hopefully is clearer now and someone can help – fionpo Jan 18 '19 at 05:03

1 Answers1

2

Why do you use regex for this at all?

@first = split('\t', $line1);
@second = split('\t', $line2);
die "Different length arrays" unless($#first == $#second);
@combined = map { $first[$_] || $second[$_] } [0..$#first];

You could add a check to die if both arrays have a value for the same index but that complicates the elegant map slightly.

If you are hellbent on using regex, and can get the lines lined up next to each other, the regex you have is basically the way to go. I would use ([^\t]*) instead of (.*?) to completely disambiguate it.

s/^([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)$/$1\t$2\t$10\t$4\t$12\t\t$7/

where $1..$7 are from the first half and $8..$14 correspond to the first through seventh fields in the second (so we use 1 - 2 - 3+7=10 - 4 - 5+7=12 - nothing - 7 to get the fields you want).

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Last line stolen...^W *adapted* from https://stackoverflow.com/questions/38345/is-there-an-elegant-zip-to-interleave-two-lists-in-perl-5 – tripleee Jan 18 '19 at 05:10
  • This is to be implemented in a Filemaker solution and yes, it can be done in it's own scripting, however I am trying to use regex if at all possible. This will have several advantages if implemented the way I think it could in regex, maybe it can not. As far as both arrays with values for the same index, this will not happen, other than both empty. – fionpo Jan 18 '19 at 05:25
  • 2
    Sorry, not familiar with Filemaker. This is probably information you should add to your question. Probably you should also show how you are using the regex you have. – tripleee Jan 18 '19 at 05:28
  • I was just answering your question: "Why do you use regex for this at all?", I am not seeking help on how to do this in Filemaker. My question is still on how to do it in regex. I showed how am I using the limited regex I have in "Line1" above. This is as far as I got. Thanks – fionpo Jan 18 '19 at 05:35
  • Regex by itself has no facility for modifying anything; a regex can simply not match or match a single substring within a single string. If you can show us how you get those two lines (are you reading consecutive lines from a file? Or are they available as variables? What if there is more than two lines?) then maybe somebody can devise something regex-based. – tripleee Jan 18 '19 at 05:42
  • This is what I suspected, can not work on more than 1 string. Thank you very much – fionpo Jan 18 '19 at 05:45
  • Basically there could be any number of lines for both, and a big single string can be created with all lines of both strings as needed. I will add an example in my main post – fionpo Jan 18 '19 at 05:50
  • Perl regex can straddle newlines so you could do something like `s/(1)2(3)\n4(5)6/$1$2$3/` – tripleee Jan 18 '19 at 05:52
  • I added a comment to the main post" "NOTE in response to Tripleee:". Please see if this will facilitate the use of regex in this case – fionpo Jan 18 '19 at 06:06
  • Updated answer with a regex solution. – tripleee Jan 18 '19 at 06:16
  • Got it! Thank you very much. One last question 1) what is the s at the beginning, I am using regexr.com to test this and it is not taking the s. Thanks again – fionpo Jan 18 '19 at 07:05
  • It's a [Perl built-in function to perform a regex substitution.](https://perldoc.perl.org/functions/s.html) – tripleee Jan 18 '19 at 07:38