0

I am trying to work with lexnlp to read through a csv that I have of a legal case in order to separate out different information found in the text such as all of the acts listed, dates, etc.

I have formatted everything exactly as the lexnlp website indicates. However, my csv is not reading properly. My professor reccomended that I write a loop to iterate through the csv so each sentence gets read. After searching through different information on writing iteration loops, I'm still not quite understanding what to do.

I've found this input for row in text.iterrows(): but I don't know what action I should have it run. I've asked classmates and they also seem lost. Below is my code. Any and all help is useful.


url = 'https://raw.githubusercontent.com/unt-iialab/INFO5731_Spring2020/master/In_class_exercise/01-05-1%20%20Adams%20v%20Tanner.txt'
text = pd.read_csv(url,error_bad_lines=False, names=['sentence'])


#Output appears & reads  fine with this portion
#Indicates that CSV is getting read properly
print('Number of Sentences:' , len(text['sentence']))

!pip install lexnlp


#Cannot get nlp module to read csv
import lexnlp.extract.en.acts

#This Version gives back empty brackets. I believe because it is reading text as a string. 
print(lexnlp.extract.en.acts.get_act_list('text'))

#This is the format used in the number of sentences. It creates an error message.
print(lexnlp.extract.en.acts.get_act_list(text['sentence']))

#This is the format that the lexnlp site reccommends. It also creates an error message. 
print(lexnlp.extract.en.acts.get_act_list(text))




#The following are just different features of the lexnlp module that I am going to run. 
import lexnlp.extract.en.amounts
print(list(lexnlp.extract.en.amounts.get_amounts(text)))

import lexnlp.extract.en.citations
print(list(lexnlp.extract.en.citations.get_citations(text)))

import lexnlp.extract.en.entities.nltk_re
print(list(lexnlp.extract.en.entities.nltk_re.get_entities.nltk_re.get_companies(text)))

import lexnlp.extract.en.conditions
print(list(lexnlp.extract.en.conditions.get_conditions(text)))

import lexnlp.extract.en.constraints
print(list(lexnlp.extract.en.constraints.get_constraints(text)))

import lexnlp.extract.en.copyright
print(list(lexnlp.extract.en.copyright.get_copyright(text)))

import lexnlp.extract.en.courts

import lexnlp.extract.en.cusip
print(lexnlp.extract.en.cusip.get_cusip(text))

import lexnlp.extract.en.dates
print(list(lexnlp.extract.en.dates.get_dates(text)))

import lexnlp.extract.en.definitions
print(list(lexnlp.extract.en.definitions.get_definitions(text)))

import lexnlp.extract.en.distances
print(list(lexnlp.extract.en.distances.get_distances(text)))

import lexnlp.extract.en.durations
print(list(lexnlp.extract.en.durations.get_durations(text)))

import lexnlp.extract.en.money
print(list(lexnlp.extract.en.money.get_money(text)))

import lexnlp.extract.en.percents
print(list(lexnlp.extract.en.percents.get_percents(text)))

import lexnlp.extract.en.pii
print(list(lexnlp.extract.en.pii.get_pii(text)))

import lexnlp.extract.en.ratios
print(list(lexnlp.extract.en.ratios.get_ratios(text)))

import lexnlp.extract.en.regulations
print(list(lexnlp.extract.en.regulations.get_regulations(text)))

import lexnlp.extract.en.trademarks
print(list(lexnlp.extract.en.trademarks.get_trademarks(text)))

import lexnlp.extract.en.urls
print(list(lexnlp.extract.en.urls.get_urls(text)))

Below is the error code I receive:

<ipython-input-2-301f76c3c169> in <module>()
     19 
     20 #This is the format used in the number of sentences. It creates an error message.
---> 21 print(lexnlp.extract.en.acts.get_act_list(text['sentence']))
     22 
     23 #This is the format that the lexnlp site reccommends. It also creates an error message.

2 frames
/usr/local/lib/python3.6/dist-packages/lexnlp/extract/en/acts.py in get_acts_annotations(text)
     37 
     38 def get_acts_annotations(text: str) -> Generator[ActAnnotation, None, None]:
---> 39     for match in ACT_PARTS_RE.finditer(text):
     40         captures = match.capturesdict()
     41         act_name = ''.join(captures.get('act_name') or [])

TypeError: expected string or buffer```
martineau
  • 119,623
  • 25
  • 170
  • 301
Lauren A.
  • 11
  • 3
  • "Below is the error code I receive" - so... where's the error code? Please post the _full traceback_ – ForceBru Mar 13 '20 at 17:48
  • I updated the error code above. – Lauren A. Mar 13 '20 at 17:57
  • 1
    The url you are using does not take you to a .csv file, just a normal text file. Therefore, there is no point using pandas, which handles tabular data, for this. Please see [this question](https://stackoverflow.com/q/1393324/3324095) (and its answers) for how to get a txt file from a url and iterate (loop) through each line. – FiddleStix Mar 13 '20 at 18:08

2 Answers2

0

Try the following code:

import csv

with open('file.csv', 'rb') as csvfile: csvreader = csv.reader(csvfile, delimiter=',')

for row in csvreader:
    print(row)

"Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed."

Refer

Arun
  • 2,222
  • 7
  • 43
  • 78
0
import pandas as pd

df = pd.read_csv('csv_file.csv', index_col=None , header=True) 

pd.read_csv('') depends on relative or absolute path that you're using. It will read your data as a DataFrame.

hula-hula
  • 119
  • 1
  • 11