0

What I am doing:

  1. Get data from data source (could be from API or scraping) in form of a dictionary
  2. Clean/manipulate some of the fields
  3. Combine fields from data source dictionary into new dictionaries that represent objects
  4. Save the created dictionaries into database

Is there a pythonic way to do this? I am wondering about the whole process but I'll give some guiding questions:

  1. What classes should I have?
  2. What methods/classes should the cleaning of fields from the data source to objects be in?
  3. What methods/classes should the combining/mapping of fields from the data source to objects be in?

If the method is different in scraping vs. api, please explain how and why

Here is an example:

API returns:

 {data: {
     name: "<b>asd</b>",
     story: "tame",
     story2: "adjet"
     }
 }

What you want to do:

  1. Clean name
  2. Create a name_story object
  3. Set name_story.name = dict['data']['name']
  4. Set name_story.story = dict['data']['story'] + dict['data']['story2']
  5. Save name_story to database

(and consider that there could be multiple objects to create and multiple incoming data sources)

How would you structure this process? An interface of all classes/methods would be enough for me without any explanation.

Alds
  • 65
  • 2
  • 8

1 Answers1

1

What classes should I have?

In Python, there is no strong need to use classes. Classes are the way to manage complexity. If your solution is not complex, use functions (or, maybe, module-level code, if it is one-time solution)

If the method is different in scraping vs. api, please explain how and why

I prefer to organize my code in respect with modularity and principle of least knowledge and define clear interfaces between parts of modules system.

Example of modular solution

You can have module (either function or class) for fetching information, and it should return dictionary with specified fields, no matter what exactly it does.

Another module should process dictionary and return dictionary too (for example).

Third module can save information from that dictionary to database.

There is great possibility, that this plan far from what you need or want and you should develop your modules system yourself.

And some words about your wants:

Clean name

Consider this stackoverflow answer

Create a name_story object

Set name_story.name = dict['data']['name']

Set name_story.story = dict['data']['story'] + dict['data']['story2']

If you want to have access to attributes of object through dot (as you specified in 3 and 4 items, you could use either python namedtuple or plain python class. If indexed access is OK for you, use python dictionary.

In case of namedtuple, it will be:

from collections import namedtuple
NameStory = namedtuple('NameStory', ['name', 'story'])
name_story1 = NameStory(name=dict['data']['name'], story=dict['data']['story'] + dict['data']['story2'])
name_story2 = NameStory(name=dict2['data']['name'], story=dict2['data']['name'])

If your choice if dictionary, it's easier:

name_story = {
    'name': dict['data']['name'], 
    'story': dict['data']['story'] + dict['data']['story2'],
}

Save name_story to database

This is much more complex question.

You can use raw SQL. Specific instructions depends on your database. Google for 'python sqlite' or 'python postgresql' or what you want, there are plenty of good tutorials.

Or you can utilize one of python ORMs:

By the way

It's strongly recommended to not override python built-in types (list, dict, str etc), as you did in this line:

name_story.name = dict['data']['name']
Community
  • 1
  • 1
Oleksii M
  • 1,458
  • 14
  • 22