0

I have a folder with 50 csv files like countrieslist1.csv,countrieslist2.csv, countrieslist3.csv and so on. I have a code where I can read the values from a csv file using pandas and plot the required graph from the data .What I want is that my code should take the first csv file ,do the plotting and save it as png file then it should take second csv file do the same and like this for every csv file so that in the end I should have 50 png file(one for each csv file)
I tried

import pandas as pd
import os
import matplotlib.pyplot as plt

folder_path = "C:/Users/xyz/Desktop/Countrieslist"
df=pd.read_csv(folder_path)

X=df.'columnname'.value_counts.(normalize=True).head(5)
X.plot.barh()
plt.ylabel()
plt.xlabel()
plt.title()
plt.savefig(folder_path[:-3]+'png')

This gives the output but it only for a single csv file.But I want a code that should take all csv files one by one, do the plotting and save it as png file.How can I do that?

Beast
  • 23
  • 4
  • Are you sure your code works? Looks as though C:/Users/xyz/Desktop/Countrieslist is a directory in which case pd.read_csv() would fail. The assignment to X will fail due to SyntaxError – DarkKnight Mar 22 '23 at 08:09
  • Yes ,it works properly but it only takes one csv file – Beast Mar 22 '23 at 08:50
  • Actually for all the csv files 'column name' is same i.e(src_country) but values are different – Beast Mar 22 '23 at 08:51

3 Answers3

2

You can use the following code:

import pandas as pd
import pathlib
import matplotlib.pyplot as plt

folder_path = pathlib.Path("C:/Users/xyz/Desktop/Countrieslist")

def create_image(filename, columnname):
    df = pd.read_csv(filename)
    ax = (df[columnname].value_counts(normalize=True).head(5)
                        .plot.bar(ylabel='Count', xlabel='Country',
                                  title='Value counts',
                                  legend=False, rot=0))
    plt.savefig(folder_path / f'{filename.stem}.png')
    
for filename in folder_path.glob('*.csv'):
    create_image(filename, 'Country')

countrieslist5.png

enter image description here

countrieslist8.png

enter image description here

Input data:

REGIONS = ['AL', 'AT', 'BE', 'BG', 'CH', 'CZ', 'DE', 'DK',
           'EE', 'ES', 'FI', 'FR', 'GR', 'HR', 'HU', 'IE',
           'IT', 'LT', 'LU', 'LV', 'ME', 'NL', 'NO', 'PL',
           'PT', 'RO', 'RS', 'SE', 'SI', 'SK', 'UK']

for i in range(1, 10):
    df = pd.DataFrame({'Country': np.random.choice(REGIONS, 200)})
    df.to_csv(f'Countrieslist/countrieslist{i}.csv', index=False)
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Bearing mind that I know nothing about matplotlib can you explain why plt.close() is necessary? Looks odd – DarkKnight Mar 22 '23 at 08:28
  • You are right here because I use a function so the figure handler is closed at the end. Usually, I create figure directly in the loop. So if you don't close the figure, matplotlib raises a warning (https://stackoverflow.com/q/21884271/15239951) – Corralien Mar 22 '23 at 08:36
0

Since you already have os imported, you are able to use the listdir function present in os

You can use the following code to iterate over the contents of the directory, and if the file isn't a csv file, it continues the iteration

for file in os.listdir(folder):
    if not file.endswith('.csv'): continue

    df=pd.read_csv(file)
    # continue with other code here
-1

First in first get .csv files:

import glob, os

csv_files = []
os.chdir("C:/Users/xyz/Desktop/Countrieslist")
for file in glob.glob("*.csv"):
    csv_files.append(file)

The next step is do your magic in a loop:

for file in csv_files:
    df=pd.read_csv(file)

    X=df.'columnname'.value_counts.(normalize=True).head(5)
    X.plot.barh()
    plt.ylabel()
    plt.xlabel()
    plt.title()
    plt.savefig(file+'.png')
Andrey
  • 19
  • 3