0

in the following code I get an error every time the line doesn't have each of the three identifiers. How can I skip the line and move to the next if the identifiers are not present in the file? If the first line does not have mfgcode, modelno, and qtyavail then the program fails. Thank you for your time.

import csv
import re

with open('file.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

ff = []
for line in csv_reader:
       ff.append([re.search('mfgcode="(.+?)"', line[0] ).group(1),re.search('modelno="(.+?)"', line[0] ).group(1),re.search('qtyavail="(.+?)"', line[0] ).group(1)])
        
df = pd.DataFrame(ff,columns =['mfgcode','modelno','qtyavail'])
df.to_csv("test.csv",index=False)
print (df)

Traceback:

 line 10, in <module>
    ff.append([re.search('mfgcode="(.+?)"', line[0] ).group(1),re.search('modelno="(.+?)"', line[0] ).group(1),re.search('qtyavail="(.+?)"', line[0] ).group(1)])
AttributeError: 'NoneType' object has no attribute 'group'

First three lines of csvfile:

<checkresp>  <header errcode="success" errmsg="sucess" />
<part branch="1" core="0.00" cost="15.69" deliverytime="1" desc="" errcode="success" kit="" linecode="nike" linenum="1" list="23.42" mfgcode="nike" modelno="1221" qtyavail="120" qtyreq="1" uom="" />
</checkresp>
mjbaybay7
  • 99
  • 5
  • Can you post the full traceback (terminal output) of the error thrown? It will help us narrow down where the issue is. – Josh Honig Aug 07 '20 at 18:51
  • It would be very helpful if you could also share two lines - one where identifiers are present, other where they are absent. – Harsh Aug 07 '20 at 19:02
  • @JoshHonig I have updated to include traceback. Thank you – mjbaybay7 Aug 07 '20 at 19:08
  • @Harsh I updated the post the include the first three lines of the csv. First and third do not have identifiers but the 2nd line does. Thank you – mjbaybay7 Aug 07 '20 at 19:09

2 Answers2

1

I think Nambo's solution should be suffice.

But, if you wanna do it without try-catch, considering if one of the identifier is present, then other identifiers are also present, do a search for just one of them, and continue if the search fails -

mfgcode = re.search('mfgcode="(.+?)"', line[0])
if mfgcode:
    ff.append([mfgcode.group(1),re.search('modelno="(.+?)"', line[0] ).group(1),re.search('qtyavail="(.+?)"', line[0] ).group(1)])

One thing that I am still worried about is the line[0]. Make sure it represents the line you need.

Harsh
  • 395
  • 2
  • 7
1

You're trying to insert the following in to a list:

[re.search('mfgcode="(.+?)"', line[0] ).group(1),re.search('modelno="(.+?)"', line[0] ).group(1),re.search('qtyavail="(.+?)"', line[0] ).group(1)]

The problem is, when re.search finds nothing, it returns an object of None. You're trying to get match group 1 of the re.search result, but when it finds nothing, it raises an exception, because you can't get group 1 of None.

When re.search does find a match, it will return an object of type re.Match; this is when you want to get match group 1 of the search. See example below.

>>> import re
>>> a = re.search('a', 'b')
>>> type(a)
<class 'NoneType'>
>>> a = re.search('a', 'a')
>>> type(a)
<class 're.Match'>
>>> 

Moving your search statements outside of the array to append may help. Something like this:

ff = []
for line in csv_reader:
       mfgcode = re.search('mfgcode="(.+?)"', line[0] )
       modelno = re.search('modelno="(.+?)"', line[0] )
       qtyavail = re.search('qtyavail="(.+?)"', line[0] )
       ff.append(
              [
                     'No Data' if mfgcode is None else mfgcode.group(1),
                     'No Data' if modelno is None else modelno.group(1),
                     'No Data' if qtyavail is None else qtyavail.group(1),
              ]
       )

Note that this uses in-line if statements, which are explained well here.

Josh Honig
  • 177
  • 1
  • 12