1

I have two example csv files, csvexample.csv looks like this:

ID Text  
1  'good morning'  
2  'good afternoon'  
3  'good evening'  

While csvexample1.csv looks like this:

Day Month  
14  'Feb'  
21  'Mar'  
31  'May' 

With the following code, I get the result that I want, which is to add the first column of csvexample.csv and the second column of csvexample1.csv to one list; res.

import csv

res = []
with open('csvexample.csv') as f, open('csvexample1.csv') as a:
    reader=csv.reader(f) 
    reader1=csv.reader(a)
    next(reader)
    next(reader1)
    for row in zip(reader, reader1):
        res.extend([row[0][0], row[1][1]])  

print(res)   

I get the following outcome:

['1', 'Feb', '2', 'Mar', '3', 'May']  

However, the actual csv files I want to apply this code to contain some empty cells, seeing as I am adding the Twitter bio from companies from one file and the Tweets of those companies from another file into one list, but some companies do not have a bio on Twitter so those cells in a specific column are empty. Furthermore, in most cases the first file has much less rows than the second file, but the outcome then seems to stop when the first file has no rows left and ignores all the other rows in the second file. For example, if I edit csvexample.csv like this:

ID Text   
1  'good morning'  
2  'good afternoon'   

3  'good evening'  
4  

and csvexmple1.csv like this:

Day Month  
14  'Feb'  
21     
31  'May'  

I get the following outcome:

['1', 'feb', '2', '', '', 'may']  

instead of the desired outcome:

['1', 'feb', '2', '', '', 'may', '4']

I tried many different things but I really can't edit it to the required outcome.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []
with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

for row in zip_longest(reader, reader1, fillvalue=''):
    var1 = row[0][0] if len(row[0]) else ''
    var2 = row[1][1] if len(row[1]) else ''
    res.extend([var1, var2])

print(res)

This example gives me the following error: Traceback (most recent call last): File "thesis.py", line 31, in <module> var2 = row[1][1] if len(row[1]) else '' IndexError: list index out of range

Nienke Luirink
  • 141
  • 2
  • 3
  • 9
  • Perhaps within your loop you can first check the values for `row[0]` and `row[1]` and only if they both exist, then you can update your `res` variable. – Lix May 08 '18 at 12:40
  • Possible duplicate of [zip-like function that pads to longest length?](https://stackoverflow.com/questions/1277278/zip-like-function-that-pads-to-longest-length) – avigil May 08 '18 at 14:57
  • `zip` stops at the end of the shortest iterator. You should be using `itertools.zip_longest`. – avigil May 08 '18 at 14:58

1 Answers1

4

You can use itertools.filterfalse to remove blank rows. These rows will start with \n and can be identified accordingly.

from itertools import zip_longest
from io import StringIO
import csv

mystr1 = StringIO("""ID Text
1 'good morning'
2 'good afternoon'

3 'good evening'
4
""")

mystr2 = StringIO("""Day Month
14 'Feb'
21
31 'May'
""")

res = []

with mystr1 as f, mystr2 as a:


    reader = csv.reader(f, delimiter=' ')
    reader1 = csv.reader(a, delimiter=' ')

    next(reader)
    next(reader1)

    for row in zip_longest(reader, reader1, fillvalue=''):
        try:
            var1 = row[0][0]
        except IndexError:
            var1 = ''
        try:
            var2 = row[1][1]
        except IndexError:
            var2 = ''
        res.extend([var1, var2])

print(res)

['1', "'Feb'", '2', '', '', "'May'", '3', '', '4', '']
jpp
  • 159,742
  • 34
  • 281
  • 339
  • I copied this exact code but it gave me the same outcome as with the code I had before. I still get `['1', 'feb', '2', '', '', 'may']` , so it still stop reading the rows after there has been one blank row. – Nienke Luirink May 08 '18 at 14:33
  • @NienkeLuirink, The update might help you. There are lots of tricks you can use: `zip_longest` to ensure you use the longest of both files, ternary `if` / `else` with `len` to make sure you don't get `IndexError`, etc. – jpp May 08 '18 at 14:54
  • would probably also be more readable to unpack the output of zip into two separate variables instead of double indexing into an overloaded `row` – avigil May 08 '18 at 15:03
  • @avigil, Thank you, good point. I *think* this covers everything OP would want, but still not sure. – jpp May 08 '18 at 15:08
  • @jpp thank you so much for all your help, somehow I'm still getting `Traceback (most recent call last): File "new.py", line 30, in res.extend([row[0][0] if len(row[0]) else '', row[1][1] if len(row[1]) else '']) IndexError: list index out of range` – Nienke Luirink May 08 '18 at 15:19
  • @NienkeLuirink, That is strange. You should provide an input exactly in the format I have in my answer. Then we should be able to see the problem. – jpp May 08 '18 at 15:20
  • @jpp With the edited example it now only gives me: `Traceback (most recent call last): File "thesis.py", line 31, in var2 = row[1][1] if len(row[1]) else '' IndexError: list index out of range` I'm not sure what's going wrong cause the code seems completely fine, but then again I'm running on 4 hours sleep so I might be missing something. I'll update my post with the exact same input, should be the same as your example. – Nienke Luirink May 08 '18 at 15:52
  • @NienkeLuirink, In your input data, you need a space after 21 for `delimiter=' '` to work. – jpp May 08 '18 at 15:57
  • @jpp yeah I copied that wrong into the post. What I have in my post now is the exact code I've used which gives me that indexerror. – Nienke Luirink May 08 '18 at 16:04
  • @NienkeLuirink, OK, I updated one more time, if you would like to test. This time I'm using `try` / `except`. – jpp May 17 '18 at 13:51