1

I have a list which looks like this:

10.0139_ssrn.3771318
10.1001_archdermatol.2012.418
10.1001_archinte.165.15.1737
10.1001_archinte.165.15.1743
10.1001_archinte.165.18.2142
10.1001_archinternmed.2012.127

I have a second list which looks like this:

123 10.0139_ssrn.3771318    
356 10.1001_archdermatol.2012.418
357 10.1001_archinte.165.15.1737    
6   10.1001_archinternmed.2012.127
379 10.1001_archopht.123.1.25   
12  10.1001_archoto.2010.121    
97  10.1001_archotol.127.1.25   

The second list does not contain all items in the first list and vice versa.

I would like to create a file that contains only the matches and would look like this:

123 10.0139_ssrn.3771318    
356 10.1001_archdermatol.2012.418
357 10.1001_archinte.165.15.1737    
6   10.1001_archinternmed.2012.127

I can extract individual lines the way I want with the following command in Powershell:

Get-Content 'Y:\folder\second_list.csv' | foreach {
  $_ -match "10.0139_ssrn.3771318"}| Out-File 'Y:\folder\10.0139_ssrn.3771318'

I do not manage to write a loop that draws the entries from the first file. I tried something like this:

Get-Content 'Y:\folder\second_list.csv' | foreach {
  $line -contains (Get-Content "Y:\folder\first_list.csv")| Out-file "Y:\folder\output.csv" -append}

There are two problems: first, no match is identified (although there should be some matches) and, second, the entry in the output file is always “FALSE” (rather than the matching line of the second_list or no entry at all if no match is found).

vonPryz
  • 22,996
  • 7
  • 54
  • 65
  • Are there any header in your `csv` files? If yes, what are the names> Are there any columns in your `csv` files? how are they separated? Using this [`Join-Object script`](https://www.powershellgallery.com/packages/Join)/[`Join-Object Module`](https://www.powershellgallery.com/packages/JoinModule) (see also: [In Powershell, what's the best way to join two tables into one?](https://stackoverflow.com/a/45483110/1701026)): `Import .\first_list.csv |Join (Import .\second_list.csv) -on -eq ` – iRon Oct 28 '22 at 11:43
  • Thanks. Unfortunately, my files are too large to use this command. The second list has 3m KB. The first one is small 800KB. – user13818643 Nov 02 '22 at 08:12
  • The join command is able to use the [PowerShell pipeline](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_pipelines?view=powershell-7.2) (for the left object list). Meaning that you might try to swap the input: `Import .\second_list.csv |Join (Import .\first_list.csv) -on -eq ` – iRon Nov 02 '22 at 08:22

2 Answers2

0

I made 2 sample files: test1.csv:

header
"10.0139_ssrn.3771318356"
"10.1001_archdermatol.2012.418"
"10.1001_archinte.165.15.17376"
"10.1001_archinternmed.2012.127"
"10.1001_archopht.123.1.2512"
"10.1001_archoto.2010.12197"
"10.1001_archotol.127.1.25"

test2.csv:

header
"10.0139_ssrn.3771318356"
"10.1001_archdermatol.2012.418"
"10.1001_archinte.165.15.1737"
"10.1001_archinte.165.15.1743"
"10.1001_archinte.165.18.2142"
"10.1001_archinternmed.2012.127"

Then looping over all items in file 1, and checking whether they occur in file 2:

$csv1 = Import-Csv "E:\users\temp\test1.csv"
$csv2 = Import-Csv "E:\users\temp\test2.csv"
$elementsToKeep = @()
foreach ($element1 in $csv1) {
    foreach ($element2 in $csv2) {
        if ($element1.header -eq $element2.header) {
            $elementsToKeep += $element1
        }
    }
}

$elementsToKeep | Export-Csv "E:\users\temp\output.csv" -NoTypeInformation

Content of output.csv:

"header"
"10.0139_ssrn.3771318356"
"10.1001_archdermatol.2012.418"
"10.1001_archinternmed.2012.127"
0

Try following :

$filename = "c:\temp\test.csv"
$list1 = @("10.0139_ssrn.3771318", ` 
          "10.1001_archdermatol.2012.418", ` 
          "10.1001_archinte.165.15.1737", `
          "10.1001_archinte.165.15.1743", `
          "10.1001_archinte.165.18.2142", `
          "10.1001_archinternmed.2012.127")
$csv = Import-Csv -Path $filename -Header 'number', 'name'
$csv | Format-Table
$filteredData = $csv.Where({$list1.Contains($_.name)})
$filteredData | Format-Table

Here is output

number name
------ ----
123    10.0139_ssrn.3771318
356    10.1001_archdermatol.2012.418
357    10.1001_archinte.165.15.1737
6      10.1001_archinternmed.2012.127
379    10.1001_archopht.123.1.25
12     10.1001_archoto.2010.121
97     10.1001_archotol.127.1.25



number name
------ ----
123    10.0139_ssrn.3771318
356    10.1001_archdermatol.2012.418
357    10.1001_archinte.165.15.1737
6      10.1001_archinternmed.2012.127
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • Thanks. Unfortunately, my first list has 29k entries which makes your suggestion not very practical. – user13818643 Nov 02 '22 at 08:12
  • Do not understand you issue. I just used a sample array to test code. You just need to use you actual input array. You only need the Import-CSV and the WHERE statements. The other lines where just for testing. – jdweng Nov 02 '22 at 08:48